Which Claude tier should I start on?

Sonnet 4.5. About 90% of production workloads land here. Add a Haiku 4 router for easy requests and an Opus 4.7 escalator for the 5-15% of hard ones only after you have metrics showing you need them.

When is Opus 4.7 actually worth $15/$75 per MTok?

Long-horizon agents (10+ tool calls), deep multi-file coding, and research reports with 20k+ tokens of reasoning. For everything else, Sonnet hits 90%+ of the quality at 20% of the cost.

Can Haiku 4 really do production work?

Yes, for bounded tasks — classification, extraction, intent detection, routing. On these jobs it scores within 5 percentage points of Sonnet at 15-20× lower cost. Don't use it for open-ended generation or long-context synthesis.

What's the real cost with prompt caching?

Claude cache reads are 10% of input price. A 5,000-token system prompt at 75% cache-hit drops effective input cost from $15/MTok (Opus) to ~$4.50/MTok. Never run Opus in production without caching.

What's Claude Haiku 4's context window?

200k tokens, same as Sonnet and Opus. Anthropic kept context parity across the family in the 4.x generation.

Claude Opus 4.7 vs Sonnet 4.5 vs Haiku 4 — Benchmarks, Prices, Picking a Tier

Picking a Claude tier in April 2026

Anthropic's three-tier strategy — Opus / Sonnet / Haiku — gives you a ~20× price spread across the same model family. That's more range than most teams need on a single workload, but it's exactly right for a tiered architecture where different requests hit different models. Here's how to map workloads to tiers and what the April 2026 numbers really look like.

Tier	Input $/MTok	Output $/MTok	Cache read $/MTok	Latency P50	Best fit
Opus 4.7	$15.00	$75.00	$1.50	~6-12s	Top 10% hardest requests, agent loops 10+ steps, research
Sonnet 4.5	$3.00	$15.00	$0.30	~2-4s	Production default — chatbots, coding assistants, RAG
Haiku 4	$0.80	$4.00	$0.08	~0.5-1.5s	Routing, intent, classification, extraction at scale

Opus 4.7 — the quality ceiling

Opus 4.7 is the first Claude model to meaningfully beat Opus 4.1 on long-horizon agents. Internal benchmarks from Anthropic show a 15-25% improvement on 20-turn agent tasks (edit, run tests, fix, repeat). SWE-bench Verified sits at 79% pass rate. Where Opus still struggles relative to GPT-5: strict structured output under stress (you occasionally get JSON with commentary wrapping it).

Price matters here. At $15 input / $75 output per MTok — and with output tokens usually costing 4-5× input on real workloads — Opus is genuinely expensive. Prompt caching is the rescue. Cache write at $18.75/MTok, cache read at $1.50/MTok. A chatbot with a 5,000-token system prompt and 10 tool schemas hitting 75% cache-hit rate drops effective input cost from $15/MTok to ~$4.50/MTok. Don't run Opus without caching in production.

Sonnet 4.5 — the 90% answer

Sonnet 4.5 lands within 5 percentage points of Opus on MMLU-Pro, within 8 on SWE-bench Verified, and inside the margin of error on most writing and summarization tasks. It's 5× cheaper on input, 5× cheaper on output, and roughly 2× faster. For 90% of production workloads, Sonnet is the correct default.

When Sonnet fails: long-horizon agents with 15+ tool calls, deep multi-file refactors, research reports requiring 20k+ tokens of reasoning. That's when you escalate to Opus — not on every request, but on the tail.

Haiku 4 — routing and throughput

Haiku 4 is the killer tier most teams under-use. At $0.80 input / $4 output it's 15-20× cheaper than Opus and covers 95%+ of "small" LLM tasks: intent classification, named-entity extraction, boolean checks, routing decisions. A typical support chatbot architecture looks like this:

Haiku 4 classifies intent and detects if the request is easy or hard (cost: $0.001/call).
Easy → answer with Haiku 4 + RAG ($0.003/call).
Hard → escalate to Sonnet 4.5 ($0.012/call).
Top 5% by complexity → Opus 4.7 ($0.08/call, cached).

Weighted average on a realistic support workload: ~$0.006/call versus $0.08/call running Opus everywhere. Same quality, 93% cheaper.

How to tier your workload in 30 minutes

Run 50 real requests through Sonnet 4.5 and measure pass rate + cost + latency.
Bucket failures by type. If they are "needed more reasoning", try Opus.
Bucket successes. If any sub-task (routing, extraction, summarization of short text) could run on Haiku, move it.
Ship Sonnet as default, Haiku as router, Opus as escalator. Instrument the escalation rate.
Review monthly. Escalation rate drifting up? Prompt regression. Drifting down? You can shrink the Opus budget.

When to pick a different vendor entirely

Strict JSON / function calling at scale: GPT-5 is still slightly ahead.
2M context or video ingestion: Gemini 3 Pro.
EU data residency, on-prem: Cohere Command R+ or Mistral Large 3.
Hard math / proofs: OpenAI o4 — slower but higher AIME scores.

Benchmarks (April 2026 leaderboards)

Benchmark	Opus 4.7	Sonnet 4.5	Haiku 4
SWE-bench Verified	79.3	71.0	44.8
MMLU-Pro	89.1	85.2	76.4
GPQA Diamond	72	68	54
tau-bench retail	80	78.8	65.0
MATH 500	90	83	70

FAQ on tier picking

The calculator and FAQ below handle the top questions. Most teams get 80% of the value from the first three moves: turn on caching, add a Haiku router, measure escalation rate.

Keep going

Token price comparison — Put your workload numbers in and see per-call cost across all three tiers.
Prompt cache savings — Quantify the 70-85% drop from caching a system prompt.
Which model should I use? — Answer 6 questions and get a ranked shortlist.
AI Spend Tracker — Line-item your current AI spend by workload and model.

Claude Opus vs Sonnet vs Haiku

Frequently asked questions