Who's actually cheapest for production in April 2026?

For bulk workloads, Gemini 3 Flash at $0.15 / $0.60 per MTok. For medium quality, Gemini 3 Pro or Sonnet 4.5. For top quality, the cheapest-per-quality-unit is almost always Sonnet with prompt caching.

Do I get enterprise discounts?

At ~$50k/mo+ committed spend, all three of Anthropic, OpenAI, and Google will negotiate 15-30% off sticker on an annual commit. Below that, focus on caching and routing — it's cheaper than procurement.

How often do prices change?

Major model launches (quarterly-ish) usually include a price reset — sometimes cheaper at the same quality, occasionally higher for new top tiers. Check per-MTok pricing quarterly.

Is there a true cross-provider comparison tool?

OpenRouter and LiteLLM both give you one API with multiple providers behind it. Use them for A/B testing and failover, not as a sole vendor — they add a small margin.

What about open-source alternatives?

Llama 4 and Qwen 3 at comparable sizes match Sonnet 4.5 on many benchmarks. Self-hosting costs $0.20-0.80 per MTok all-in on an H100 at decent utilization. Worth it above ~50M tokens/day of steady traffic, usually not below.

AI API Pricing Comparison — Every Major Provider (April 2026)

Every major AI API, priced (April 2026)

One table, one decision rule: the lowest sticker price rarely wins. Pick the model that clears your quality bar, then optimize cost with caching, routing, and response-length caps. Here is the April 2026 snapshot across OpenAI, Anthropic, Google, Mistral, xAI, and Cohere.

Provider	Model	Input $/MTok	Output $/MTok	Context	Notable feature
OpenAI	GPT-5	$5.00	$20.00	400k	Best structured output / tool use
OpenAI	GPT-5 mini	$0.40	$1.60	400k	Cheap OpenAI tier
OpenAI	o4	$12.00	$48.00	200k	Reasoning model (slow, accurate)
Anthropic	Claude Opus 4.7	$15.00	$75.00	200k	Top coding + long agent
Anthropic	Claude Sonnet 4.5	$3.00	$15.00	200k	Production default
Anthropic	Claude Haiku 4	$0.80	$4.00	200k	Router / classifier
Google	Gemini 3 Pro	$1.25	$10.00	2M	2M context, multimodal
Google	Gemini 3 Flash	$0.15	$0.60	1M	Cheapest production model
Mistral	Mistral Large 3	$2.00	$6.00	128k	EU-hosted, multilingual
xAI	Grok 4	$3.00	$15.00	256k	Fresh data, X integration
Cohere	Command R+	$2.50	$10.00	128k	RAG-optimized + private deploy

The real unit cost is not sticker

Three adjustments turn sticker into unit cost, and the adjustments are often larger than the spread between models:

Prompt caching. Claude cache reads are 10% of input price. A 6,000-token system prompt cached at 75% hit rate drops effective input cost by 65-70%. Every major provider now offers caching; most teams forget to turn it on.
Output token ratio. Output tokens cost 4-5× input. A feature that returns 1,200 tokens when 300 would do is paying 4× too much. Cap max_tokens aggressively.
Retry rate. Schema validation failures, "please try again" wrappers, and agent loops silently 2-3× your effective cost. Measure retries as a first-class metric.

Decision rule for picking a provider

Already on AWS / enterprise deal: Claude via Bedrock is often the shortest path.
EU-only data residency: Mistral Large 3 on La Plateforme or Gemini on Vertex EU.
On-prem / private-cloud: Cohere Command R+ (AWS/Oracle) or self-hosted Llama 4 / Qwen 3.
Multimodal (video / audio / charts): Gemini 3 Pro is the strongest multimodal tier.
Bulk classification / ETL: Gemini 3 Flash or GPT-5 mini.
Everything else: Sonnet 4.5 with an Opus 4.7 escalator.

Per-workload cost benchmarks

Typical monthly spend for three common workloads, running the cheapest production-grade pick in each family (assumes caching on):

Workload	Daily calls	Sonnet 4.5	GPT-5	Gemini 3 Pro
Support chatbot (2k in, 400 out)	5,000	~$1,350	~$2,100	~$850
Coding assistant (10k in, 1.5k out)	2,000	~$1,800	~$3,000	~$1,300
Bulk extraction (3k in, 200 out)	50,000	~$1,700	~$1,250	~$780

Keep going

LLM API Cost Calculator — Plug in tokens + volume for your specific workload.
Which model should I use? — Advisor that recommends a model from your use case.
AI Spend Tracker — Track current spend across every model in your stack.
ChatGPT vs Claude vs Gemini — Head-to-head on benchmarks, cost, and use cases.

AI API pricing comparison

Frequently asked questions