Should I price above cost or match competitors?

Match competitors only if your cost structure matches theirs. Otherwise price above cost — margin is survival in AI.

AI Compute Break-Even Calculator

Compute break-even: how many customers you need to cover infra

Compute break-even is the least-sexy, most-important AI business model math. It answers: given my per-user variable cost, my fixed infra cost, and my revenue per user, how many paying users does it take before I stop losing money on the compute line? Most AI startups get this wrong in the optimistic direction, sometimes by 2–3×.

The simple formula that everyone starts with

break_even_users = fixed_monthly_cost / (revenue_per_user - variable_cost_per_user)

If you have $10k/month in fixed infra and charge $40/month with $15/user/month variable cost, you break even at 400 users. Simple math, correct as far as it goes. But it assumes average user behavior, which is where most AI startups lie to themselves.

The power-user distribution problem

AI product usage is rarely normally distributed. A typical product has:

P50: 15% of average variable cost. (Uses the feature occasionally.)
P80: 100% of average. (The user your pricing assumes.)
P95: 400% of average. (Power user, cost-unprofitable.)
P99: 1500% of average. (Abuse/automation, heavily net-negative.)

If you model break-even using P50 variable cost, you get a rosy number. If you use mean variable cost (dragged up by the fat tail), you get a 2–3× worse number that is also more honest.

Worked example: AI meeting notes SaaS at $25/seat

Fixed monthly infra (vector DB, observability, baseline compute): $2,500/mo.
Per-user variable cost at P50: $3 (2-3 meetings/week).
Per-user variable cost at mean: $7 (heavy team users drag up).
Per-user variable cost at P95: $30 (transcribes everything, long meetings).
Revenue/user: $25 (seat).
Break-even at P50 variable cost: $2,500 / ($25 - $3) = 114 users.
Break-even at mean variable cost: $2,500 / ($25 - $7) = 139 users.
At 500 users, if 5% are P95 users, they cost $2,500/mo (total $7,500 marginal); the other 95% cost $3,000/mo. Total variable: $10,500. Revenue: $12,500. Net margin: 16%, not the 72% the simple model predicted.

Self-host vs. API break-even

Self-hosting a Llama 4 70B or Qwen 3 72B stack on H100s is typically $3k–$10k/month fixed, plus per-token costs that are 5–15× cheaper than API. Break-even against an API stack: usually at 10–40M output tokens/month (depending on APIs compared and how well you utilize GPUs). Below that, API wins on ops and reliability. Above it, self-hosting starts winning on total cost, assuming engineer-hours are not the binding constraint.

What to do with this number

Set pricing so that break-even is hit at 500–2,000 customers, not 5,000+.
Build power-user detection + caps or per-tier usage limits before you have 500 users.
Report unit economics to investors monthly, not quarterly. AI unit costs move.
Re-run this calc every time you add a feature with meaningful variable cost.

Three worked scenarios at different API cost profiles

The variable-cost input determines everything. Here is what the break-even math actually produces for three common April 2026 workloads.

Scenario 1: Support chatbot SaaS, 250k requests per client per month

Per request: 2,350 input + 280 output on Sonnet 4.5. Uncached: $2,812/mo per client. With prompt caching on the 800-token system prefix (90% read discount, 73% hit): $1,657. With Haiku 4 routing on 65% of FAQ-style queries: $1,062/mo per client. Price seat-based at $2,500/mo with a 300k-request cap. Fixed infra (Pinecone + Langfuse + baseline): $2,500/mo. Break-even: $2,500 / ($2,500 - $1,062) = 2 clients. Straightforward — per-client economics are strong, so you break even on the infra floor almost immediately.

Scenario 2: RAG pipeline SaaS at $49/mo/seat, 50k queries per client per month

Per query: 7,220 input + 550 output = $0.030 API cost uncached on Sonnet 4.5. With 92% cache hit on the 3,200-token system prompt: ~$0.022/query. 50k queries × $0.022 = $1,108/mo variable cost per client. Priced at $49/seat with 100 seats per client = $4,900/mo revenue; variable cost $1,108; contribution $3,792. Fixed infra $3,500/mo. Break-even: less than 1 client. Real risk is per-seat unit economics at the edges: if a seat hits 5,000 queries/mo (P95), variable cost on that one seat is $110 against $49 revenue — a negative contribution margin customer masked by the average.

Scenario 3: Code assistant at $40/seat, 40 queries/dev/day

8,800 queries/mo/team = $267 API cost on Sonnet 4.5 with a few Opus 4.1 escalations bringing it to $320/mo. At $40/seat × 10 devs = $400/mo revenue; $320 variable cost; contribution $80. Fixed infra $1,500/mo. Break-even: 1,500 / 80 = 19 teams. On 500 teams you have $200k revenue against $160k variable cost plus $1,500 fixed = ~20% gross margin. That is the classic "code assistant pricing is hard" result; fix it by caching aggressively, routing, and raising seat price to $50-60.

Cost levers with math that move the break-even

Anthropic cache (90% read discount): On a 1k-token system prompt at 200,000 QPM, saves $540/mo ($600 uncached → $60 cached) per tenant. Applied across 500 tenants, that is $270k/mo of margin recaptured.
OpenAI 50% cache: Softer discount but automatic. GPT-5 drops from $5 to $2.50/M input on the cached prefix.
Gemini 75% context cache: Long-context workloads benefit most; 2.5 Pro drops from $1.25 to roughly $0.31/M on cached input.
Model routing Haiku 4 → Sonnet 4.5: 60-70% traffic on Haiku at $0.80/$4 vs Sonnet at $3/$15 saves roughly 70% on routed traffic.
Response cap: max_tokens = 400 instead of 4,096 shaves 20-40% off output cost on chat workloads.

Model selection rules and how they shift break-even

Haiku 4 as default router (vs Sonnet-only): drops variable cost from $0.011/query to $0.003/query on simple classifications. Break-even users drop 3×.
Sonnet 4.5 for production general-purpose: the right default. Opus is a trap — 5× the cost for 2-3 pp quality lift on typical tasks.
GPT-5 mini ($0.40/$1.60) competes directly with Haiku when the downstream workflow needs strict JSON output.
Gemini 2.5 Flash ($0.15/$0.60) for bulk enrichment: 5× cheaper than Haiku. Validate quality on your exact prompts.

Production patterns that keep break-even honest

The break-even calc is only real if your production system actually hits the per-user variable cost you model. A fallback chain (Sonnet 4.5 → GPT-5 → Haiku 4 + simplified prompt → static error) and per-provider circuit breakers prevent 3am outages from turning a 95th-percentile customer into a 99th-percentile one. Retry budgets (3-5 attempts, hard token ceiling) stop agent loops from silently 3×-ing variable cost. Per-tenant monthly token caps are the single most important margin protection; without them, one B2B customer can burn a quarter's worth of savings in 48 hours. Expose the cap to the customer via an API — it doubles as a compliance/audit feature most buyers appreciate.

Frequently asked questions

How often should I recompute break-even? Monthly at minimum. AI costs move 10-30% per quarter as prices drop and your workload shifts.

What break-even user count is "healthy"? 500–2,000 users is typical for well-priced AI SaaS. Over 5,000 usually means you underpriced or over-scoped fixed infra.

Should I model break-even using median or mean variable cost? Mean, or P80. Median is optimistic and hides the fat tail that actually dominates real unit economics.

Is the self-host break-even worth chasing at Series A? Rarely. The engineering time to run Llama 4 70B reliably on H100s is 0.5-1 FTE; at Series A that FTE is better spent on product. Revisit at Series B.

How do I detect and cap power users without churning them? Soft cap at P95 with a dashboard notification; hard cap at P99. Offer an enterprise tier at 2-3× the price for the top 1% of usage.

What is the typical fixed-cost floor at Seed vs Series A? Seed: $2-5k/mo (vector DB, observability, baseline compute). Series A: $15-40k/mo as you add multi-region redundancy and eval infrastructure.

Does free-tier abuse blow up break-even? Yes, frequently. Hard-cap free tiers (10 queries/day on Haiku-only) or skip the free tier on usage-heavy products.

How much does a retention problem shift break-even? Break-even compounds with churn: if monthly churn is 5%, you need 1.5× the gross adds to hit the same net break-even. Low churn and unit economics are symbiotic, not independent.

Scenario modeling: the three break-even curves you should run

Run break-even three ways before any pricing decision. (1) Flat assumptions — every user at median behavior. Gives you the sticker-price break-even. (2) Power-law assumptions — 5% of users drive 60% of variable cost. Almost always pushes break-even 30-60% higher than flat. (3) Churn-adjusted — net adds instead of gross adds, at realistic monthly churn (4-8% for SMB SaaS, 1-3% for enterprise). The true break-even is the worst of the three. If the worst-case break-even is more than 18 months away at realistic acquisition pace, the pricing is wrong or the cost structure is wrong; fix one before scaling spend.

Operational levers that move break-even without touching price

Cache hit rate on the system prefix. A 20pp improvement in cache hit rate typically drops variable cost 15-25%, moving break-even forward by the same percentage.
Model routing between Haiku 4 and Sonnet 4.5 on classified traffic. 60-70% routing to Haiku on simple intents cuts variable cost 45-60% on the routed portion.
Response-length caps. Setting max_tokens to actual typical length plus 20% cuts output cost (the expensive side) 20-40%.
Per-tenant quotas. Capping the top 1% at 3× the median usage bounds the power-user tail and stabilizes unit economics.
Batch API (50% discount) for any workload that can tolerate up to 24h latency. On eval pipelines and nightly enrichment, this is pure margin.

Keep going

AI SaaS pricing — set the revenue-per-user input.
Startup runway — the existential version of this math.
Inference GPU cost — the self-host side.
LLM API cost — the variable-cost input.

Use the data programmatically

Every calculator on this site is also exposed as a free, CORS-open JSON endpoint. No auth, no rate limit (fair-use, please cache). License is CC-BY-4.0 — link back to attribution.canonicalUrl in the response.

Endpoint: https://aieconomyhub.co/api/page/compute-break-even

curl

curl -s 'https://aieconomyhub.co/api/page/compute-break-even' | jq .

Python

import requests

r = requests.get("https://aieconomyhub.co/api/page/compute-break-even", timeout=10)
r.raise_for_status()
data = r.json()
print(data["title"])
for faq in data.get("faqs", []):
    print("Q:", faq["q"])

JavaScript / Node

// Node 20+ / modern browser
const res = await fetch("https://aieconomyhub.co/api/page/compute-break-even");
if (!res.ok) throw new Error("HTTP " + res.status);
const compute_break_even = await res.json();
console.log(compute_break_even.title);
for (const faq of compute_break_even.faqs ?? []) {
  console.log("Q:", faq.q);
}

Spec: /api/openapi.yaml · Docs: /api/docs

Compute break-even

Frequently asked questions

Compute break-even: how many customers you need to cover infra

The simple formula that everyone starts with

The power-user distribution problem

Worked example: AI meeting notes SaaS at $25/seat

Self-host vs. API break-even

What to do with this number

Three worked scenarios at different API cost profiles

Scenario 1: Support chatbot SaaS, 250k requests per client per month

Scenario 2: RAG pipeline SaaS at $49/mo/seat, 50k queries per client per month

Scenario 3: Code assistant at $40/seat, 40 queries/dev/day

Cost levers with math that move the break-even

Model selection rules and how they shift break-even

Production patterns that keep break-even honest

Frequently asked questions

Scenario modeling: the three break-even curves you should run

Operational levers that move break-even without touching price

Use the data programmatically

Track your AI tool costs, ROI, and productivity metrics

More free tools