Products

A curated catalog of every model worth paying for.

We list models only after we've benchmarked them, set up a routing pool, and convinced ourselves the inference path is stable enough for production. No vapor entries, no deprecated SKUs lingering in the docs.

Frontier tier · premium reasoning

When the answer matters more than the dollar.

Top-of-stack models for complex reasoning, long-form synthesis, and high-stakes tool-using agents. Priced higher per token, but the per-task economics often beat the mid-tier once you account for retry rates and human review.

G
Gemini 3.1 Pro
Frontier · 1M ctx · multimodal

One-million-token context window with strong native multimodal understanding. Our preferred choice for repository-scale code review and long PDF analysis.

text vision audio tools 1M context
5
GPT 5.5
Frontier · 1M ctx · generalist

The pragmatic default. Best-in-class structured outputs, mature function calling, and the most predictable latency profile under bursty traffic.

text vision tools json mode
C
Claude Opus 4.7
Frontier · 1M ctx · long-form

Strongest long-form reasoning and instruction adherence we test against. Pairs well with agentic frameworks and is our top pick for code-writing pipelines.

text vision tools artifacts
Balanced tier · production workhorses

Where most of your traffic should land.

Frontier-adjacent quality at a fraction of the per-token cost. These are the models we recommend for the bulk of customer-facing surfaces — chat, drafting, summarization, retrieval-augmented Q&A.

F
Gemini 3 Flash
Balanced · 1M ctx · fast

Long-context understanding at sub-second TTFT. Excellent for streaming UIs where users see the first token before they finish breathing.

text vision audio tools fast
m
GPT-5.4 mini
Balanced · 400K ctx · efficient

The same generalist instincts as its bigger sibling, tuned for lower cost and higher throughput. Solid retry target when frontier capacity is saturated.

text vision tools
S
Claude Sonnet 4.6
Balanced · 1M ctx · agentic

A favorite for agent loops — strong tool-use, low refusal-rate on ambiguous prompts, and very stable behavior across long multi-turn sessions.

text vision tools agents
Efficient tier · open weights & specialists

Volume work, settled economics.

When unit economics decide whether a feature ships at all. Open-weights flagships and purpose-built specialists, available on shared capacity or as reserved replicas.

L
Llama 4 Maverick
Open · 1M ctx · flagship

Open-weights flagship with strong general capability. Choose dedicated replicas for data-residency, or share capacity for unbeatable per-token economics.

text vision open-weights dedicated
D
DeepSeek V4
Specialist · 1M ctx · code & math

Punches above its price on code and math benchmarks. The right call for cost-sensitive developer-tooling pipelines that still demand top-decile output.

text code value
Q
Qwen3-Max
Multilingual · 256K ctx · open

Strongest multilingual coverage in the catalog, especially for Chinese, Japanese, and Korean. Pairs well with vision tasks for cross-border consumer surfaces.

text multilingual
M
Mistral Large 3
Balanced · 256K ctx · EU-hosted

European-hosted option with reliable instruction following and competitive pricing. Often the simplest answer for EU data-residency requirements.

text tools eu region
R
Command R+
Specialist · 128K ctx · RAG

Purpose-built for retrieval-augmented generation. Excellent citation behavior and grounded responses make it our top pick for knowledge-base applications.

text rag citations
P
Phi-4
Compact · 16K ctx · edge

A small-but-capable specialist for classification, intent routing, and structured extraction at scale. Costs pennies; runs everywhere.

text compact cheap
Use cases

Shapes of work we route well.

A model name on a benchmark doesn't tell you which one will hold up under your traffic. Here's how teams compose the catalog in production.

01 · Agents & copilots

Long-running tool-using sessions.

Sonnet or GPT 5.5 for the loop; Opus for hard subtasks; an efficient specialist for the structured tool arguments. We expose a single conversation ID and stitch traces so you can audit every hop.

02 · RAG & search

Grounded answers with citations.

Command R+ as default; Gemini 3 Flash when context windows blow past common limits; Phi-4 for the embedding-adjacent classifier work. Streaming JSON keeps your search UI responsive even on multi-document grounding.

03 · Code generation

Editor-quality code across repos.

Claude Opus 4.7 or Gemini 3.1 Pro on the writing side, DeepSeek V4 on the verification side. We route diff-style requests to whichever pool has the lowest TTFT in your region — the user just sees fast.

04 · Customer-facing chat

Predictable cost per conversation.

Balanced tier as the default, with auto-failover into the efficient tier under saturation. Per-tenant rate budgets prevent one heavy user from starving the rest.

05 · Batch & pipelines

Cost-first asynchronous workloads.

Open-weights flagships with dedicated capacity and overnight-window scheduling. We can quote pricing per million completed jobs, not per token, when that's the shape your finance team prefers.

Integration

Drop-in. Not rip-and-replace.

The API mirrors the OpenAI REST schema, so existing SDKs and tooling work without modification — change the base URL and the API key, and you're in.

Python
from openai import OpenAI

client = OpenAI(
    base_url="https://api.idclinks.com/v1",
    api_key=os.environ["IDCLINKS_KEY"],
)

resp = client.chat.completions.create(
    model="claude-sonnet-4.6",
    messages=[{"role": "user", "content": "Hi"}],
    stream=True,
)

for chunk in resp:
    print(chunk.choices[0].delta.content, end="")
TypeScript
import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.idclinks.com/v1",
  apiKey: process.env.IDCLINKS_KEY,
});

const stream = await client.chat.completions.create({
  model: "gpt-5.5",
  messages: [{ role: "user", content: "Hi" }],
  stream: true,
});

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0].delta.content ?? "");
}

Need a model that's not on the list?

We add models when customers commit volume against them. Tell us what you'd ship if it were available, and we'll come back with a timeline.