Pricing

Honest ranges. Locked rates on commitment.

Per-token rates depend on the model tier, your region, and whether you're on shared or dedicated capacity. We publish ranges so you can size a workload before talking to anyone — then quote a specific rate when the shape is clear.

Pay-as-you-go

Developer

$0/mo base

Per-token billing across the full catalog. No commitment, no minimums — start with a trial key, scale when you're ready.

All catalog models
Shared inference pools
Streaming, tools, vision, JSON mode
Email support, business hours
10K req/min default ceiling

Request a trial key

Most teams

Scale

From $2k/mo committed

Volume commitment unlocks priority routing, lower per-token rates, and reserved capacity headroom for peak traffic.

Locked per-token rates for the term
Priority routing & failover
Per-tenant rate budgets
SLO with credit-back guarantee
Shared Slack channel

Talk to sales

Reserved capacity

Enterprise

Custommonthly term

Reserved replicas, BYO-provider, data-residency, zero-retention routes, and the paperwork your security team requires.

Dedicated GPU replicas
Custom regions & data residency
Zero-retention inference routes
BYO-provider key support
SOC 2 type II artifacts on request

Contact sales

Per-token ranges

Public pay-as-you-go rates.

Ranges represent the spread between regions, capacity types, and load conditions. The rate you actually pay is determined by your route and is shown in the response headers of every request.

Model	Tier	Context	Input / 1M tokens	Output / 1M tokens
Claude Opus 4.7	frontier	1M	$13.00 – $18.00	$65.00 – $90.00
Gemini 3.1 Pro	frontier	1M	$2.20 – $3.50	$13.00 – $18.00
GPT 5.5	frontier	1M	$5.00 – $8.00	$30.00 – $40.00
Claude Sonnet 4.6	balanced	1M	$3.00 – $4.50	$15.00 – $20.00
GPT-5.4 mini	balanced	400K	$0.80 – $1.40	$4.50 – $6.50
Gemini 3 Flash	balanced	1M	$0.55 – $1.00	$3.00 – $4.50
Mistral Large 3	balanced	256K	$0.55 – $1.20	$1.60 – $3.50
Llama 4 Maverick	efficient	1M	$0.20 – $0.55	$0.70 – $1.60
DeepSeek V4	efficient	1M	$0.20 – $0.50	$0.80 – $1.80
Qwen3-Max	efficient	256K	$0.85 – $1.40	$4.00 – $5.50
Command R+	efficient	128K	$2.50 – $3.50	$10.00 – $13.00
Phi-4	efficient	16K	$0.14 – $0.25	$0.50 – $0.80

Ranges in USD. Committed customers see flat locked rates inside this range; the exact rate is set in your order form. Dedicated capacity is priced per replica-month and quoted separately.

Add-ons

Pricing for the things around the model.

Most workloads only need the per-token meter. These line items show up for teams who need more control over where their inference runs.

Add-on	What it covers	Price range
Dedicated replica	Reserved GPU capacity for a single model, in a region of your choice.	$3,500 – $14,000replica · month
Zero-retention route	Prompt and completion bytes never leave volatile memory. Audit log only.	+10 – 20%on token rates
Data-residency lock	Requests pinned to a specific region with hard failover off.	+5 – 12%on token rates
Premium support	24/7 paging, named contact, quarterly architecture review.	$1,500 – $4,000per month
BYO-provider routing	Use your own provider key under our schema, routing, and observability layer.	$0.10 – $0.30per 1K requests

Pricing FAQ

Common questions from finance reviews.

Why ranges instead of single prices?

Because the rate genuinely varies. A request routed to a US-West shared pool at off-peak isn't priced the same as a Tokyo dedicated replica under peak load. Publishing one number would mean either undercharging us out of business or overcharging customers who don't need the premium path. Ranges are honest; the exact rate is in your response headers and your invoice.

How do committed rates work?

You commit to a monthly token volume (or a dollar floor) for a term of 3, 6, or 12 months. In exchange, your per-token rate is locked at a single number inside the public range for the duration. Unused commitment doesn't roll over by default; we can quote a roll-over clause if it matters to your forecasting.

How is billing measured?

Per token, as reported by the upstream provider, with no rounding-up. Cached prompt tokens are billed at the standard cached rate (usually 25–50% of input price, depending on the model). Tool-use and structured-output overhead is included in the token count; we don't charge a separate "agent surcharge."

Do you offer free credits or a trial?

Yes — every new account starts with a trial credit large enough to validate a real workload, not just hello-world. Email sales@idclinks.com with a short description of what you'd test and we'll size the credit accordingly.

What payment methods do you accept?

Credit card, ACH/wire, and invoice (net-30) for committed customers. We can support most procurement systems and master service agreements; talk to sales for the paperwork.

Honest ranges. Locked rates on commitment.

Public pay-as-you-go rates.

Pricing for the things around the model.

Common questions from finance reviews.

Want a quote for your shape?