- All catalog models
- Shared inference pools
- Streaming, tools, vision, JSON mode
- Email support, business hours
- 10K req/min default ceiling
Per-token rates depend on the model tier, your region, and whether you're on shared or dedicated capacity. We publish ranges so you can size a workload before talking to anyone — then quote a specific rate when the shape is clear.
Ranges represent the spread between regions, capacity types, and load conditions. The rate you actually pay is determined by your route and is shown in the response headers of every request.
| Model | Tier | Context | Input / 1M tokens | Output / 1M tokens |
|---|---|---|---|---|
| Claude Opus 4.7 | frontier | 1M | $13.00 – $18.00 | $65.00 – $90.00 |
| Gemini 3.1 Pro | frontier | 1M | $2.20 – $3.50 | $13.00 – $18.00 |
| GPT 5.5 | frontier | 1M | $5.00 – $8.00 | $30.00 – $40.00 |
| Claude Sonnet 4.6 | balanced | 1M | $3.00 – $4.50 | $15.00 – $20.00 |
| GPT-5.4 mini | balanced | 400K | $0.80 – $1.40 | $4.50 – $6.50 |
| Gemini 3 Flash | balanced | 1M | $0.55 – $1.00 | $3.00 – $4.50 |
| Mistral Large 3 | balanced | 256K | $0.55 – $1.20 | $1.60 – $3.50 |
| Llama 4 Maverick | efficient | 1M | $0.20 – $0.55 | $0.70 – $1.60 |
| DeepSeek V4 | efficient | 1M | $0.20 – $0.50 | $0.80 – $1.80 |
| Qwen3-Max | efficient | 256K | $0.85 – $1.40 | $4.00 – $5.50 |
| Command R+ | efficient | 128K | $2.50 – $3.50 | $10.00 – $13.00 |
| Phi-4 | efficient | 16K | $0.14 – $0.25 | $0.50 – $0.80 |
Ranges in USD. Committed customers see flat locked rates inside this range; the exact rate is set in your order form. Dedicated capacity is priced per replica-month and quoted separately.
Most workloads only need the per-token meter. These line items show up for teams who need more control over where their inference runs.
| Add-on | What it covers | Price range |
|---|---|---|
| Dedicated replica | Reserved GPU capacity for a single model, in a region of your choice. | $3,500 – $14,000replica · month |
| Zero-retention route | Prompt and completion bytes never leave volatile memory. Audit log only. | +10 – 20%on token rates |
| Data-residency lock | Requests pinned to a specific region with hard failover off. | +5 – 12%on token rates |
| Premium support | 24/7 paging, named contact, quarterly architecture review. | $1,500 – $4,000per month |
| BYO-provider routing | Use your own provider key under our schema, routing, and observability layer. | $0.10 – $0.30per 1K requests |