Using this recommender
There are 20+ production-grade LLMs in April 2026 and the answer to "which one should I use?" is never universal. It depends on your task type, budget, context length, latency SLO, privacy stance, and volume. The advisor above weights each question against every candidate model and returns a ranked shortlist β not a single pick, because you should actually A/B the top 2-3 on your own prompts.
The 6 questions that matter, and why
1. Task type
Task type is the biggest single predictor of model fit. Coding workloads should not run on a non-coding-tuned model. Bulk classification should not run on Opus. Long-context document QA should probably run on Gemini 3 Pro. Map task β model, then optimize cost from there.
2. Budget sensitivity
"We'll spend whatever works" and "$500/month or we shut it off" lead to different architectures. On a tight budget, prioritize Haiku 4 or Flash as workhorse and only escalate to Sonnet/GPT-5 on hard cases. On a loose budget, you can run Sonnet 4.5 by default and still stay well under the average SaaS line item.
3. Context length
For inputs under 100k tokens, all major models are fine. Above that, Gemini 3 Pro's 2M context is uniquely useful β Claude and GPT-5 top out at 200-400k. Video and audio are also Gemini territory.
4. Latency SLO
Real-time chat needs P95 under 3 seconds. Haiku 4 and Gemini 3 Flash deliver that routinely. Sonnet and GPT-5 are fine for 2-6 second SLOs. o4 and Opus on hard prompts should be treated as batch jobs.
5. Privacy and residency
Standard US cloud: pick anything. EU residency: Mistral Large 3 on La Plateforme, Gemini on Vertex EU, or Cohere Command R+ on private EU deploys. On-prem or air-gapped: Cohere private deploy, or self-hosted Llama 4 / Qwen 3.
6. Volume
Below 1k calls/day, optimize for quality first β the dollar difference between Opus and Haiku is ~$30/month. Above 50k calls/day, optimize for Haiku/Flash by default and escalate only on confidence triggers.
Common scenarios and picks
| Scenario | Top pick | Runner-up |
|---|---|---|
| B2B support chatbot | Sonnet 4.5 + Haiku 4 router | GPT-5 + mini router |
| Coding assistant inside an IDE | Claude Code (Opus) or Cursor (Sonnet) | Copilot (GPT-5) |
| Bulk extraction from PDFs | Gemini 3 Flash | Haiku 4 |
| Competitive math / proofs | OpenAI o4 | Opus 4.7 |
| Long doc QA (200k+ tokens) | Gemini 3 Pro | Sonnet 4.5 |
| Marketing copywriting | Sonnet 4.5 | GPT-5 |
| Voice agent (low latency) | Haiku 4 or Flash | GPT-5 mini |
| Agent with 10+ tool calls | Opus 4.7 | GPT-5 |
Anti-patterns to avoid
- One model for everything. You will overpay on easy requests or under-deliver on hard ones.
- Picking the cheapest without running your prompts. A 15% quality gap wipes out a 5Γ price advantage on user-facing work.
- Ignoring caching. Cache write is 1.25Γ input; cache read is 0.1Γ input. You are paying 10Γ on repeated content if you don't cache.
- Ignoring output length. Max-tokens is free. Set it aggressively, add "respond in β€ N sentences" to the prompt.
- ChatGPT vs Claude vs Gemini β Cross-vendor comparison in detail.
- Claude tier picker β Drill into the three Claude tiers.
- LLM API Cost Calculator β Plug in numbers for your shortlist.
- Prompt Template Generator β Build the prompt you'll actually test with.