2026 AI Pricing Cheat Sheet
Updated April 2026Every major AI model on one page — text, image, voice, video. Verified against provider public pricing pages. Re-checked monthly. From AI Economy Hub.
Text models — $ per million tokens
| Model | Input | Output | Context | Pick when |
| Claude Opus 4.7 | $15.00 | $75.00 | 200k | Reasoning, hard agent loops |
| Claude Sonnet 4.5 | $3.00 | $15.00 | 200k | Default workhorse — cache drops input to $0.30 |
| Claude Haiku 4 | $0.80 | $4.00 | 200k | Classifier / router tier |
| GPT-5 | $5.00 | $20.00 | 400k | Tool use + structured output king |
| GPT-5 mini | $0.40 | $1.60 | 400k | Direct Haiku competition |
| GPT-4o | $2.50 | $10.00 | 128k | Still the default for many mid-tier flows |
| o4 (reasoning) | $15.00 | $60.00 | 200k | Best-in-class on math + planning |
| Gemini 2.5 Pro | $1.25 | $10.00 | 1M | Cheap long-context; uneven on hard reasoning |
| Gemini 2.5 Flash | $0.15 | $0.60 | 1M | Throughput champ for bulk classification |
| Mistral Large 2 | $2.00 | $6.00 | 128k | EU-hosted; strong on multilingual |
| Grok 3 (xAI) | $3.00 | $15.00 | 131k | Realtime web + X integration |
| Cohere Command R+ | $2.50 | $10.00 | 128k | Enterprise RAG; strong on grounded answers |
Image — per generation
| Model | Price | Strength |
| Midjourney v7 | $0.020/image (Pro) | Best-in-class style, no API |
| DALL·E 4 (OpenAI) | $0.040/image (1024×1024 HD) | Strong API, strict policy filter |
| Flux 1.1 Pro | $0.040/image | Best open-weights flagship via Replicate/Together |
| Stable Diffusion 4 | $0.005–0.020/image (API) | Cheapest at scale; self-host saves more |
| Imagen 4 (Google) | $0.030/image | Cleanest text-in-image, strong realism |
Voice (TTS) — per 1k characters
| Model | Price | Strength |
| ElevenLabs v3 | $0.18/1k chars (Creator) | Best voice cloning + emotion |
| OpenAI TTS HD | $30/M chars | Cheap, lifelike, no clone |
| PlayHT 3.0 | $0.012/1k chars | Good clone, real-time API |
| Cartesia Sonic | $0.05/1k chars | Sub-300ms latency, real-time |
Video — per generated clip
| Model | Price | Strength |
| OpenAI Sora 2 | $1.20–$3.00 / 10s | Best fidelity, 10–20s clips |
| Runway Gen-4 | $0.50–$1.00 / 5s | Director-grade controls |
| Pika 2.0 | $0.20–$0.40 / 5s | Cheap social-video factory |
| Kling 2.0 | $0.25–$0.60 / 5s | Best motion realism on humans |
Five rules that move the bill
- Output tokens cost 4–5× input — short responses save serious money.
- Anthropic + OpenAI prompt cache shaves up to 90% off repeated context (system prompts, RAG).
- Batch APIs (OpenAI, Anthropic, Gemini) give a flat 50% discount for non-interactive jobs.
- Fine-tuning rarely beats a well-cached system prompt + retrieval until you exceed ~5M monthly calls.
- Embedding cost is a rounding error; vector-DB hosting is not. Pinecone serverless at scale > self-hosted pgvector.