Picking a Claude tier in April 2026
Anthropic's three-tier strategy β Opus / Sonnet / Haiku β gives you a ~20Γ price spread across the same model family. That's more range than most teams need on a single workload, but it's exactly right for a tiered architecture where different requests hit different models. Here's how to map workloads to tiers and what the April 2026 numbers really look like.
| Tier | Input $/MTok | Output $/MTok | Cache read $/MTok | Latency P50 | Best fit |
|---|---|---|---|---|---|
| Opus 4.7 | $15.00 | $75.00 | $1.50 | ~6-12s | Top 10% hardest requests, agent loops 10+ steps, research |
| Sonnet 4.5 | $3.00 | $15.00 | $0.30 | ~2-4s | Production default β chatbots, coding assistants, RAG |
| Haiku 4 | $0.80 | $4.00 | $0.08 | ~0.5-1.5s | Routing, intent, classification, extraction at scale |
Opus 4.7 β the quality ceiling
Opus 4.7 is the first Claude model to meaningfully beat Opus 4.1 on long-horizon agents. Internal benchmarks from Anthropic show a 15-25% improvement on 20-turn agent tasks (edit, run tests, fix, repeat). SWE-bench Verified sits at 79% pass rate. Where Opus still struggles relative to GPT-5: strict structured output under stress (you occasionally get JSON with commentary wrapping it).
Price matters here. At $15 input / $75 output per MTok β and with output tokens usually costing 4-5Γ input on real workloads β Opus is genuinely expensive. Prompt caching is the rescue. Cache write at $18.75/MTok, cache read at $1.50/MTok. A chatbot with a 5,000-token system prompt and 10 tool schemas hitting 75% cache-hit rate drops effective input cost from $15/MTok to ~$4.50/MTok. Don't run Opus without caching in production.
Sonnet 4.5 β the 90% answer
Sonnet 4.5 lands within 5 percentage points of Opus on MMLU-Pro, within 8 on SWE-bench Verified, and inside the margin of error on most writing and summarization tasks. It's 5Γ cheaper on input, 5Γ cheaper on output, and roughly 2Γ faster. For 90% of production workloads, Sonnet is the correct default.
When Sonnet fails: long-horizon agents with 15+ tool calls, deep multi-file refactors, research reports requiring 20k+ tokens of reasoning. That's when you escalate to Opus β not on every request, but on the tail.
Haiku 4 β routing and throughput
Haiku 4 is the killer tier most teams under-use. At $0.80 input / $4 output it's 15-20Γ cheaper than Opus and covers 95%+ of "small" LLM tasks: intent classification, named-entity extraction, boolean checks, routing decisions. A typical support chatbot architecture looks like this:
- Haiku 4 classifies intent and detects if the request is easy or hard (cost: $0.001/call).
- Easy β answer with Haiku 4 + RAG ($0.003/call).
- Hard β escalate to Sonnet 4.5 ($0.012/call).
- Top 5% by complexity β Opus 4.7 ($0.08/call, cached).
Weighted average on a realistic support workload: ~$0.006/call versus $0.08/call running Opus everywhere. Same quality, 93% cheaper.
How to tier your workload in 30 minutes
- Run 50 real requests through Sonnet 4.5 and measure pass rate + cost + latency.
- Bucket failures by type. If they are "needed more reasoning", try Opus.
- Bucket successes. If any sub-task (routing, extraction, summarization of short text) could run on Haiku, move it.
- Ship Sonnet as default, Haiku as router, Opus as escalator. Instrument the escalation rate.
- Review monthly. Escalation rate drifting up? Prompt regression. Drifting down? You can shrink the Opus budget.
When to pick a different vendor entirely
- Strict JSON / function calling at scale: GPT-5 is still slightly ahead.
- 2M context or video ingestion: Gemini 3 Pro.
- EU data residency, on-prem: Cohere Command R+ or Mistral Large 3.
- Hard math / proofs: OpenAI o4 β slower but higher AIME scores.
Benchmarks (April 2026 leaderboards)
| Benchmark | Opus 4.7 | Sonnet 4.5 | Haiku 4 |
|---|---|---|---|
| SWE-bench Verified | 79.3 | 71.0 | 44.8 |
| MMLU-Pro | 89.1 | 85.2 | 76.4 |
| GPQA Diamond | 72 | 68 | 54 |
| tau-bench retail | 80 | 78.8 | 65.0 |
| MATH 500 | 90 | 83 | 70 |
FAQ on tier picking
The calculator and FAQ below handle the top questions. Most teams get 80% of the value from the first three moves: turn on caching, add a Haiku router, measure escalation rate.
- Token price comparison β Put your workload numbers in and see per-call cost across all three tiers.
- Prompt cache savings β Quantify the 70-85% drop from caching a system prompt.
- Which model should I use? β Answer 6 questions and get a ranked shortlist.
- AI Spend Tracker β Line-item your current AI spend by workload and model.