2026 AI Pricing Cheat Sheet

Updated April 2026

Every major AI model on one page — text, image, voice, video. Verified against provider public pricing pages. Re-checked monthly. From AI Economy Hub.

Text models — $ per million tokens

Model	Input	Output	Context	Pick when
Claude Opus 4.7	$15.00	$75.00	200k	Reasoning, hard agent loops
Claude Sonnet 4.5	$3.00	$15.00	200k	Default workhorse — cache drops input to $0.30
Claude Haiku 4	$0.80	$4.00	200k	Classifier / router tier
GPT-5	$5.00	$20.00	400k	Tool use + structured output king
GPT-5 mini	$0.40	$1.60	400k	Direct Haiku competition
GPT-4o	$2.50	$10.00	128k	Still the default for many mid-tier flows
o4 (reasoning)	$15.00	$60.00	200k	Best-in-class on math + planning
Gemini 2.5 Pro	$1.25	$10.00	1M	Cheap long-context; uneven on hard reasoning
Gemini 2.5 Flash	$0.15	$0.60	1M	Throughput champ for bulk classification
Mistral Large 2	$2.00	$6.00	128k	EU-hosted; strong on multilingual
Grok 3 (xAI)	$3.00	$15.00	131k	Realtime web + X integration
Cohere Command R+	$2.50	$10.00	128k	Enterprise RAG; strong on grounded answers

Image — per generation

Model	Price	Strength
Midjourney v7	$0.020/image (Pro)	Best-in-class style, no API
DALL·E 4 (OpenAI)	$0.040/image (1024×1024 HD)	Strong API, strict policy filter
Flux 1.1 Pro	$0.040/image	Best open-weights flagship via Replicate/Together
Stable Diffusion 4	$0.005–0.020/image (API)	Cheapest at scale; self-host saves more
Imagen 4 (Google)	$0.030/image	Cleanest text-in-image, strong realism

Voice (TTS) — per 1k characters

Model	Price	Strength
ElevenLabs v3	$0.18/1k chars (Creator)	Best voice cloning + emotion
OpenAI TTS HD	$30/M chars	Cheap, lifelike, no clone
PlayHT 3.0	$0.012/1k chars	Good clone, real-time API
Cartesia Sonic	$0.05/1k chars	Sub-300ms latency, real-time

Video — per generated clip

Model	Price	Strength
OpenAI Sora 2	$1.20–$3.00 / 10s	Best fidelity, 10–20s clips
Runway Gen-4	$0.50–$1.00 / 5s	Director-grade controls
Pika 2.0	$0.20–$0.40 / 5s	Cheap social-video factory
Kling 2.0	$0.25–$0.60 / 5s	Best motion realism on humans

Five rules that move the bill

Output tokens cost 4–5× input — short responses save serious money.
Anthropic + OpenAI prompt cache shaves up to 90% off repeated context (system prompts, RAG).
Batch APIs (OpenAI, Anthropic, Gemini) give a flat 50% discount for non-interactive jobs.
Fine-tuning rarely beats a well-cached system prompt + retrieval until you exceed ~5M monthly calls.
Embedding cost is a rounding error; vector-DB hosting is not. Pinecone serverless at scale > self-hosted pgvector.