AI Economy Hub

Compute break-even

Customers needed to break even on inference and infrastructure spend.

Loading calculator…

Get weekly marketing insights

Join 1,200+ readers. One email per week. Unsubscribe anytime.

Frequently asked questions

1.Should I price above cost or match competitors?

Match competitors only if your cost structure matches theirs. Otherwise price above cost — margin is survival in AI.

Compute break-even: how many customers you need to cover infra

Compute break-even is the least-sexy, most-important AI business model math. It answers: given my per-user variable cost, my fixed infra cost, and my revenue per user, how many paying users does it take before I stop losing money on the compute line? Most AI startups get this wrong in the optimistic direction, sometimes by 2–3×.

The simple formula that everyone starts with

break_even_users = fixed_monthly_cost / (revenue_per_user - variable_cost_per_user)

If you have $10k/month in fixed infra and charge $40/month with $15/user/month variable cost, you break even at 400 users. Simple math, correct as far as it goes. But it assumes average user behavior, which is where most AI startups lie to themselves.

The power-user distribution problem

AI product usage is rarely normally distributed. A typical product has:

  • P50: 15% of average variable cost. (Uses the feature occasionally.)
  • P80: 100% of average. (The user your pricing assumes.)
  • P95: 400% of average. (Power user, cost-unprofitable.)
  • P99: 1500% of average. (Abuse/automation, heavily net-negative.)

If you model break-even using P50 variable cost, you get a rosy number. If you use mean variable cost (dragged up by the fat tail), you get a 2–3× worse number that is also more honest.

Worked example: AI meeting notes SaaS at $25/seat

  • Fixed monthly infra (vector DB, observability, baseline compute): $2,500/mo.
  • Per-user variable cost at P50: $3 (2-3 meetings/week).
  • Per-user variable cost at mean: $7 (heavy team users drag up).
  • Per-user variable cost at P95: $30 (transcribes everything, long meetings).
  • Revenue/user: $25 (seat).
  • Break-even at P50 variable cost: $2,500 / ($25 - $3) = 114 users.
  • Break-even at mean variable cost: $2,500 / ($25 - $7) = 139 users.
  • At 500 users, if 5% are P95 users, they cost $2,500/mo (total $7,500 marginal); the other 95% cost $3,000/mo. Total variable: $10,500. Revenue: $12,500. Net margin: 16%, not the 72% the simple model predicted.

Self-host vs. API break-even

Self-hosting a Llama 4 70B or Qwen 3 72B stack on H100s is typically $3k–$10k/month fixed, plus per-token costs that are 5–15× cheaper than API. Break-even against an API stack: usually at 10–40M output tokens/month (depending on APIs compared and how well you utilize GPUs). Below that, API wins on ops and reliability. Above it, self-hosting starts winning on total cost, assuming engineer-hours are not the binding constraint.

What to do with this number

  1. Set pricing so that break-even is hit at 500–2,000 customers, not 5,000+.
  2. Build power-user detection + caps or per-tier usage limits before you have 500 users.
  3. Report unit economics to investors monthly, not quarterly. AI unit costs move.
  4. Re-run this calc every time you add a feature with meaningful variable cost.

Three worked scenarios at different API cost profiles

The variable-cost input determines everything. Here is what the break-even math actually produces for three common April 2026 workloads.

Scenario 1: Support chatbot SaaS, 250k requests per client per month

Per request: 2,350 input + 280 output on Sonnet 4.5. Uncached: $2,812/mo per client. With prompt caching on the 800-token system prefix (90% read discount, 73% hit): $1,657. With Haiku 4 routing on 65% of FAQ-style queries: $1,062/mo per client. Price seat-based at $2,500/mo with a 300k-request cap. Fixed infra (Pinecone + Langfuse + baseline): $2,500/mo. Break-even: $2,500 / ($2,500 - $1,062) = 2 clients. Straightforward — per-client economics are strong, so you break even on the infra floor almost immediately.

Scenario 2: RAG pipeline SaaS at $49/mo/seat, 50k queries per client per month

Per query: 7,220 input + 550 output = $0.030 API cost uncached on Sonnet 4.5. With 92% cache hit on the 3,200-token system prompt: ~$0.022/query. 50k queries × $0.022 = $1,108/mo variable cost per client. Priced at $49/seat with 100 seats per client = $4,900/mo revenue; variable cost $1,108; contribution $3,792. Fixed infra $3,500/mo. Break-even: less than 1 client. Real risk is per-seat unit economics at the edges: if a seat hits 5,000 queries/mo (P95), variable cost on that one seat is $110 against $49 revenue — a negative contribution margin customer masked by the average.

Scenario 3: Code assistant at $40/seat, 40 queries/dev/day

8,800 queries/mo/team = $267 API cost on Sonnet 4.5 with a few Opus 4.1 escalations bringing it to $320/mo. At $40/seat × 10 devs = $400/mo revenue; $320 variable cost; contribution $80. Fixed infra $1,500/mo. Break-even: 1,500 / 80 = 19 teams. On 500 teams you have $200k revenue against $160k variable cost plus $1,500 fixed = ~20% gross margin. That is the classic "code assistant pricing is hard" result; fix it by caching aggressively, routing, and raising seat price to $50-60.

Cost levers with math that move the break-even

  • Anthropic cache (90% read discount): On a 1k-token system prompt at 200,000 QPM, saves $540/mo ($600 uncached → $60 cached) per tenant. Applied across 500 tenants, that is $270k/mo of margin recaptured.
  • OpenAI 50% cache: Softer discount but automatic. GPT-5 drops from $5 to $2.50/M input on the cached prefix.
  • Gemini 75% context cache: Long-context workloads benefit most; 2.5 Pro drops from $1.25 to roughly $0.31/M on cached input.
  • Model routing Haiku 4 → Sonnet 4.5: 60-70% traffic on Haiku at $0.80/$4 vs Sonnet at $3/$15 saves roughly 70% on routed traffic.
  • Response cap: max_tokens = 400 instead of 4,096 shaves 20-40% off output cost on chat workloads.

Model selection rules and how they shift break-even

  • Haiku 4 as default router (vs Sonnet-only): drops variable cost from $0.011/query to $0.003/query on simple classifications. Break-even users drop 3×.
  • Sonnet 4.5 for production general-purpose: the right default. Opus is a trap — 5× the cost for 2-3 pp quality lift on typical tasks.
  • GPT-5 mini ($0.40/$1.60) competes directly with Haiku when the downstream workflow needs strict JSON output.
  • Gemini 2.5 Flash ($0.15/$0.60) for bulk enrichment: 5× cheaper than Haiku. Validate quality on your exact prompts.

Production patterns that keep break-even honest

The break-even calc is only real if your production system actually hits the per-user variable cost you model. A fallback chain (Sonnet 4.5 → GPT-5 → Haiku 4 + simplified prompt → static error) and per-provider circuit breakers prevent 3am outages from turning a 95th-percentile customer into a 99th-percentile one. Retry budgets (3-5 attempts, hard token ceiling) stop agent loops from silently 3×-ing variable cost. Per-tenant monthly token caps are the single most important margin protection; without them, one B2B customer can burn a quarter's worth of savings in 48 hours. Expose the cap to the customer via an API — it doubles as a compliance/audit feature most buyers appreciate.

Frequently asked questions

How often should I recompute break-even? Monthly at minimum. AI costs move 10-30% per quarter as prices drop and your workload shifts.

What break-even user count is "healthy"? 500–2,000 users is typical for well-priced AI SaaS. Over 5,000 usually means you underpriced or over-scoped fixed infra.

Should I model break-even using median or mean variable cost? Mean, or P80. Median is optimistic and hides the fat tail that actually dominates real unit economics.

Is the self-host break-even worth chasing at Series A? Rarely. The engineering time to run Llama 4 70B reliably on H100s is 0.5-1 FTE; at Series A that FTE is better spent on product. Revisit at Series B.

How do I detect and cap power users without churning them? Soft cap at P95 with a dashboard notification; hard cap at P99. Offer an enterprise tier at 2-3× the price for the top 1% of usage.

What is the typical fixed-cost floor at Seed vs Series A? Seed: $2-5k/mo (vector DB, observability, baseline compute). Series A: $15-40k/mo as you add multi-region redundancy and eval infrastructure.

Does free-tier abuse blow up break-even? Yes, frequently. Hard-cap free tiers (10 queries/day on Haiku-only) or skip the free tier on usage-heavy products.

How much does a retention problem shift break-even? Break-even compounds with churn: if monthly churn is 5%, you need 1.5× the gross adds to hit the same net break-even. Low churn and unit economics are symbiotic, not independent.

Keep going

More free tools