AI Economy Hub

Which AI model should I use?

Answer 6 questions and get a ranked shortlist of Claude, GPT, Gemini, or open-source models for your job.

Loading tool…

Get weekly marketing insights

Join 1,200+ readers. One email per week. Unsubscribe anytime.

Frequently asked questions

1.Is this recommender exhaustive?

It covers the 10 models that handle 95%+ of production workloads. We don't list every specialized model (medical, legal, vision-only) β€” for niche domains, add a specialist to the shortlist after this recommender narrows the general options.

2.Why does the advisor pick Sonnet so often?

Because Sonnet 4.5 is the correct default for 90% of production workloads in April 2026. It's not a bias in the advisor; it's a reflection of the market.

3.Should I trust this more than a benchmark leaderboard?

This advisor weights price, latency, privacy, and volume β€” factors most leaderboards ignore. Use it for architecture decisions. Use leaderboards to compare quality within a shortlist.

4.What if my task isn't in the list?

Pick the closest task category. The advisor optimizes for general fit; you'll want to validate on 50 of your own prompts afterward.

5.Can I switch models later?

Yes β€” use the LLM Migration Planner. Shadow eval + canary is a 4-week process that makes swaps safe.

Using this recommender

There are 20+ production-grade LLMs in April 2026 and the answer to "which one should I use?" is never universal. It depends on your task type, budget, context length, latency SLO, privacy stance, and volume. The advisor above weights each question against every candidate model and returns a ranked shortlist β€” not a single pick, because you should actually A/B the top 2-3 on your own prompts.

The 6 questions that matter, and why

1. Task type

Task type is the biggest single predictor of model fit. Coding workloads should not run on a non-coding-tuned model. Bulk classification should not run on Opus. Long-context document QA should probably run on Gemini 3 Pro. Map task β†’ model, then optimize cost from there.

2. Budget sensitivity

"We'll spend whatever works" and "$500/month or we shut it off" lead to different architectures. On a tight budget, prioritize Haiku 4 or Flash as workhorse and only escalate to Sonnet/GPT-5 on hard cases. On a loose budget, you can run Sonnet 4.5 by default and still stay well under the average SaaS line item.

3. Context length

For inputs under 100k tokens, all major models are fine. Above that, Gemini 3 Pro's 2M context is uniquely useful β€” Claude and GPT-5 top out at 200-400k. Video and audio are also Gemini territory.

4. Latency SLO

Real-time chat needs P95 under 3 seconds. Haiku 4 and Gemini 3 Flash deliver that routinely. Sonnet and GPT-5 are fine for 2-6 second SLOs. o4 and Opus on hard prompts should be treated as batch jobs.

5. Privacy and residency

Standard US cloud: pick anything. EU residency: Mistral Large 3 on La Plateforme, Gemini on Vertex EU, or Cohere Command R+ on private EU deploys. On-prem or air-gapped: Cohere private deploy, or self-hosted Llama 4 / Qwen 3.

6. Volume

Below 1k calls/day, optimize for quality first β€” the dollar difference between Opus and Haiku is ~$30/month. Above 50k calls/day, optimize for Haiku/Flash by default and escalate only on confidence triggers.

Common scenarios and picks

ScenarioTop pickRunner-up
B2B support chatbotSonnet 4.5 + Haiku 4 routerGPT-5 + mini router
Coding assistant inside an IDEClaude Code (Opus) or Cursor (Sonnet)Copilot (GPT-5)
Bulk extraction from PDFsGemini 3 FlashHaiku 4
Competitive math / proofsOpenAI o4Opus 4.7
Long doc QA (200k+ tokens)Gemini 3 ProSonnet 4.5
Marketing copywritingSonnet 4.5GPT-5
Voice agent (low latency)Haiku 4 or FlashGPT-5 mini
Agent with 10+ tool callsOpus 4.7GPT-5

Anti-patterns to avoid

  • One model for everything. You will overpay on easy requests or under-deliver on hard ones.
  • Picking the cheapest without running your prompts. A 15% quality gap wipes out a 5Γ— price advantage on user-facing work.
  • Ignoring caching. Cache write is 1.25Γ— input; cache read is 0.1Γ— input. You are paying 10Γ— on repeated content if you don't cache.
  • Ignoring output length. Max-tokens is free. Set it aggressively, add "respond in ≀ N sentences" to the prompt.
Keep going

More free tools