Every major AI API, priced (April 2026)
One table, one decision rule: the lowest sticker price rarely wins. Pick the model that clears your quality bar, then optimize cost with caching, routing, and response-length caps. Here is the April 2026 snapshot across OpenAI, Anthropic, Google, Mistral, xAI, and Cohere.
| Provider | Model | Input $/MTok | Output $/MTok | Context | Notable feature |
|---|---|---|---|---|---|
| OpenAI | GPT-5 | $5.00 | $20.00 | 400k | Best structured output / tool use |
| OpenAI | GPT-5 mini | $0.40 | $1.60 | 400k | Cheap OpenAI tier |
| OpenAI | o4 | $12.00 | $48.00 | 200k | Reasoning model (slow, accurate) |
| Anthropic | Claude Opus 4.7 | $15.00 | $75.00 | 200k | Top coding + long agent |
| Anthropic | Claude Sonnet 4.5 | $3.00 | $15.00 | 200k | Production default |
| Anthropic | Claude Haiku 4 | $0.80 | $4.00 | 200k | Router / classifier |
| Gemini 3 Pro | $1.25 | $10.00 | 2M | 2M context, multimodal | |
| Gemini 3 Flash | $0.15 | $0.60 | 1M | Cheapest production model | |
| Mistral | Mistral Large 3 | $2.00 | $6.00 | 128k | EU-hosted, multilingual |
| xAI | Grok 4 | $3.00 | $15.00 | 256k | Fresh data, X integration |
| Cohere | Command R+ | $2.50 | $10.00 | 128k | RAG-optimized + private deploy |
The real unit cost is not sticker
Three adjustments turn sticker into unit cost, and the adjustments are often larger than the spread between models:
- Prompt caching. Claude cache reads are 10% of input price. A 6,000-token system prompt cached at 75% hit rate drops effective input cost by 65-70%. Every major provider now offers caching; most teams forget to turn it on.
- Output token ratio. Output tokens cost 4-5Γ input. A feature that returns 1,200 tokens when 300 would do is paying 4Γ too much. Cap
max_tokensaggressively. - Retry rate. Schema validation failures, "please try again" wrappers, and agent loops silently 2-3Γ your effective cost. Measure retries as a first-class metric.
Decision rule for picking a provider
- Already on AWS / enterprise deal: Claude via Bedrock is often the shortest path.
- EU-only data residency: Mistral Large 3 on La Plateforme or Gemini on Vertex EU.
- On-prem / private-cloud: Cohere Command R+ (AWS/Oracle) or self-hosted Llama 4 / Qwen 3.
- Multimodal (video / audio / charts): Gemini 3 Pro is the strongest multimodal tier.
- Bulk classification / ETL: Gemini 3 Flash or GPT-5 mini.
- Everything else: Sonnet 4.5 with an Opus 4.7 escalator.
Per-workload cost benchmarks
Typical monthly spend for three common workloads, running the cheapest production-grade pick in each family (assumes caching on):
| Workload | Daily calls | Sonnet 4.5 | GPT-5 | Gemini 3 Pro |
|---|---|---|---|---|
| Support chatbot (2k in, 400 out) | 5,000 | ~$1,350 | ~$2,100 | ~$850 |
| Coding assistant (10k in, 1.5k out) | 2,000 | ~$1,800 | ~$3,000 | ~$1,300 |
| Bulk extraction (3k in, 200 out) | 50,000 | ~$1,700 | ~$1,250 | ~$780 |
- LLM API Cost Calculator β Plug in tokens + volume for your specific workload.
- Which model should I use? β Advisor that recommends a model from your use case.
- AI Spend Tracker β Track current spend across every model in your stack.
- ChatGPT vs Claude vs Gemini β Head-to-head on benchmarks, cost, and use cases.