A unified inference gateway

One API.
Every frontier
model.

idclinks is a single endpoint for the world's most capable language and multimodal models. Switch between providers with a string, pay predictable rates, and ship without renegotiating contracts every quarter.

Contact sales → Browse models

All systems normal v2026.05 release SOC 2 type II

Gemini 3.1 Pro GPT 5.5 Claude Opus 4.7 Llama 4 Maverick Mistral Large 3 DeepSeek V4 Qwen3-Max Command R+ GPT-5.4 mini Claude Sonnet 4.6 Gemini 3 Flash Phi-4 Gemini 3.1 Pro GPT 5.5 Claude Opus 4.7 Llama 4 Maverick Mistral Large 3 DeepSeek V4 Qwen3-Max Command R+ GPT-5.4 mini Claude Sonnet 4.6 Gemini 3 Flash Phi-4

Built for teams that take infrastructure seriously

SOC 2 TYPE II

GDPR · DPA

HIPAA BAA

ISO 27001*

99.95% SLA

6 REGIONS

Catalog

Discover models available on idclinks.

We test, benchmark, and route to the highest-performing inference path for each model. Filter by capability — you write the model name, we handle the rest.

20% off

g google · gemini

Gemini-3.1-Pro

o openai

GPT-5.5

10% off

a anthropic

Claude-Opus-4.7

20% off

d deepseek

DeepSeek-V4

a anthropic

Claude-Sonnet-4.6

30% off

o openai

GPT-5.4-mini

g google · gemini

Gemini-3-Flash

m meta · llama

Llama-4-Maverick

40% off

q qwen

Qwen3-Max

m mistral

Mistral-Large-3

c cohere

Command-R+

p phi

Phi-4

15% off

x diffuse

Diffuse-XL-3

r recraft

Recraft-V4

r reel

Reel-3-Director

v voicepro

VoicePro-9

See the full catalog →

99.95%

Endpoint uptime, T90

~120ms

P50 time-to-first-token

40+

Models, one schema

Routed regions worldwide

Why teams pick idclinks

Boring infrastructure for an exciting industry.

We don't make the models. We make sure the models you depend on stay reachable, predictable, and accountable — so your roadmap isn't a hostage to anyone else's quota.

01 / 04

One schema across every provider.

OpenAI-compatible REST and streaming. Swap models with a string — the same code path works for completions, tool-calling, vision, and structured JSON. Provider-specific extensions are exposed via documented headers, never silent gotchas.

02 / 04

Smart routing that respects your SLA.

Requests are routed to the lowest-latency healthy replica in your region. Saturated upstream? We failover to the next pool before your retry budget runs out. You see the same model name; we handle the plumbing.

03 / 04

Pricing that survives a finance review.

Per-token pricing in transparent ranges. Volume committed traffic gets dedicated capacity and locked rates for the term. No surprise per-region multipliers, no priority-tier upcharges hiding in a PDF.

04 / 04

Observability you'd build yourself.

Per-request traces with provider, replica, region, token counts, and latency breakdown. Stream them to your existing pipeline via webhook, S3, or OpenTelemetry — no proprietary dashboard you have to live in.

Pricing

Pay-as-you-go ranges. Committed pricing on request.

Per-token rates depend on the model tier, region, and whether you're on shared or dedicated capacity. Below are the public PAYG ranges. Committed customers see lower, locked rates — talk to sales.

Model tier	Example models	Input / 1M tokens	Output / 1M tokens
Frontier premium	Claude Opus 4.7 · Gemini 3.1 Pro · GPT 5.5	$4.50 – $7.50	$18.00 – $30.00	Details
Balanced default	Claude Sonnet 4.6 · Gemini 3 Flash · GPT-5.4 mini	$0.80 – $1.80	$3.00 – $6.50	Details
Efficient	Llama 4 · DeepSeek V4 · Qwen3-Max	$0.20 – $0.80	$0.80 – $2.40	Details
Dedicated capacity	Any catalog model, reserved replicas	Custommonthly term	Custommonthly term	Talk to sales

Ranges reflect the spread between regions and capacity tiers; the rate you pay depends on your route. Exact per-model rates and committed pricing are quoted on request.

FAQ

Things people ask before signing.

How is idclinks different from calling each provider directly?

Three things, mostly. One schema means you stop maintaining adapter code for every new release. Routing keeps your tail latency stable when an upstream brown-outs. And committed-capacity contracts give you reservable throughput that individual providers often won't sell to a single buyer below enterprise scale.

What happens to my data?

Requests are forwarded to the provider you select. We don't train on your traffic and we don't log prompt or completion content by default — only the metadata you'd expect (model, latency, token counts, status). Zero-retention routes are available for regulated workloads on request.

Which regions do you route from?

Six routed regions today: US East, US West, EU Frankfurt, Singapore, Tokyo, and Sydney. Requests are pinned to your nearest healthy region; failover is automatic and surfaced in the response headers so you always know where your tokens were generated.

Can I bring my own provider account?

Yes — BYO-key routes are supported on Enterprise plans. You keep your provider relationship and any negotiated rates; we handle the schema unification, routing, and observability layer on top.

How do I get started?

Email sales@idclinks.com with a short description of your workload — expected volume, the models you care about, and any compliance constraints. We typically respond within one business day with a trial key and a pricing sheet for your shape.

One API. Every frontier model.