One-million-token context window with strong native multimodal understanding. Our preferred choice for repository-scale code review and long PDF analysis.
We list models only after we've benchmarked them, set up a routing pool, and convinced ourselves the inference path is stable enough for production. No vapor entries, no deprecated SKUs lingering in the docs.
Top-of-stack models for complex reasoning, long-form synthesis, and high-stakes tool-using agents. Priced higher per token, but the per-task economics often beat the mid-tier once you account for retry rates and human review.
One-million-token context window with strong native multimodal understanding. Our preferred choice for repository-scale code review and long PDF analysis.
The pragmatic default. Best-in-class structured outputs, mature function calling, and the most predictable latency profile under bursty traffic.
Strongest long-form reasoning and instruction adherence we test against. Pairs well with agentic frameworks and is our top pick for code-writing pipelines.
Frontier-adjacent quality at a fraction of the per-token cost. These are the models we recommend for the bulk of customer-facing surfaces — chat, drafting, summarization, retrieval-augmented Q&A.
Long-context understanding at sub-second TTFT. Excellent for streaming UIs where users see the first token before they finish breathing.
The same generalist instincts as its bigger sibling, tuned for lower cost and higher throughput. Solid retry target when frontier capacity is saturated.
A favorite for agent loops — strong tool-use, low refusal-rate on ambiguous prompts, and very stable behavior across long multi-turn sessions.
When unit economics decide whether a feature ships at all. Open-weights flagships and purpose-built specialists, available on shared capacity or as reserved replicas.
Open-weights flagship with strong general capability. Choose dedicated replicas for data-residency, or share capacity for unbeatable per-token economics.
Punches above its price on code and math benchmarks. The right call for cost-sensitive developer-tooling pipelines that still demand top-decile output.
Strongest multilingual coverage in the catalog, especially for Chinese, Japanese, and Korean. Pairs well with vision tasks for cross-border consumer surfaces.
European-hosted option with reliable instruction following and competitive pricing. Often the simplest answer for EU data-residency requirements.
Purpose-built for retrieval-augmented generation. Excellent citation behavior and grounded responses make it our top pick for knowledge-base applications.
A small-but-capable specialist for classification, intent routing, and structured extraction at scale. Costs pennies; runs everywhere.
A model name on a benchmark doesn't tell you which one will hold up under your traffic. Here's how teams compose the catalog in production.
Sonnet or GPT 5.5 for the loop; Opus for hard subtasks; an efficient specialist for the structured tool arguments. We expose a single conversation ID and stitch traces so you can audit every hop.
Command R+ as default; Gemini 3 Flash when context windows blow past common limits; Phi-4 for the embedding-adjacent classifier work. Streaming JSON keeps your search UI responsive even on multi-document grounding.
Claude Opus 4.7 or Gemini 3.1 Pro on the writing side, DeepSeek V4 on the verification side. We route diff-style requests to whichever pool has the lowest TTFT in your region — the user just sees fast.
Balanced tier as the default, with auto-failover into the efficient tier under saturation. Per-tenant rate budgets prevent one heavy user from starving the rest.
Open-weights flagships with dedicated capacity and overnight-window scheduling. We can quote pricing per million completed jobs, not per token, when that's the shape your finance team prefers.
The API mirrors the OpenAI REST schema, so existing SDKs and tooling work without modification — change the base URL and the API key, and you're in.
from openai import OpenAI client = OpenAI( base_url="https://api.idclinks.com/v1", api_key=os.environ["IDCLINKS_KEY"], ) resp = client.chat.completions.create( model="claude-sonnet-4.6", messages=[{"role": "user", "content": "Hi"}], stream=True, ) for chunk in resp: print(chunk.choices[0].delta.content, end="")
import OpenAI from "openai"; const client = new OpenAI({ baseURL: "https://api.idclinks.com/v1", apiKey: process.env.IDCLINKS_KEY, }); const stream = await client.chat.completions.create({ model: "gpt-5.5", messages: [{ role: "user", content: "Hi" }], stream: true, }); for await (const chunk of stream) { process.stdout.write(chunk.choices[0].delta.content ?? ""); }