AI Economy Hub

AI SaaS pricing

Pick a monthly price that covers API cost plus your target gross margin.

Loading calculator…

Get weekly marketing insights

Join 1,200+ readers. One email per week. Unsubscribe anytime.

Frequently asked questions

1.What about power users?

Cap usage or add metered overage above a fair-use threshold. Power users can flip your margin upside down overnight.

Pricing an AI SaaS in 2026: three models, one trap

Pricing AI SaaS is harder than pricing traditional SaaS because the cost-of-goods-sold scales with usage in ways most founders don't model before launch. A customer who uses the product twice a week costs $2/month to serve. A customer who uses it constantly costs $200/month. Your $79/month flat plan loses you money on the second customer and subsidizes the first. Pricing model choice is therefore first-order, not cosmetic.

The three pricing archetypes

ModelExamplesGross margin profileWhen it works
Flat seatNotion AI ($10/mo), Copilot ($19/mo)70-85% at scaleUsage capped by human time; bounded workload
Usage-based (credits)Replicate, Cursor overage, OpenAI API50-75%High-volume agentic or automated workloads
Outcome-basedIntercom Fin ($0.99/resolution), Decagon (per-conversation)65-80% after rampClear outcome metric, predictable unit cost
Hybrid (seat + usage)Cursor ($20 + overage), v0 (seat + credits)70-80%Most production AI SaaS land here

The gross-margin math most founders skip

Traditional SaaS runs at 75–90% gross margin. AI SaaS runs at 50–80% depending on architecture. The lower floor is driven by API cost. A realistic unit-economic model for an AI product at $49/month per seat:

  • Revenue: $49/month.
  • Payment processing (Stripe): $1.70/month.
  • Infra baseline: $3/month.
  • LLM API cost at median usage: $8/month.
  • LLM API cost at P90 power user: $35/month.
  • Support cost: $1–2/month average.
  • Median gross margin: ~71%. P90 power user: ~12%. P99: negative.

Pricing-model selection tree

  1. Is usage per customer fairly predictable? (Think: meeting notes β€” your meetings are roughly the same every week.) β†’ Flat seat pricing.
  2. Is usage highly variable and agentic? (Think: coding assistant, AI search agent.) β†’ Seat + usage overage.
  3. Is there a clean outcome the buyer cares about? (Resolutions, signed docs, deflected tickets, qualified leads.) β†’ Outcome-based.
  4. B2B with procurement + compliance?β†’ Annual contracts with credit allowances + overage, not raw usage-based billing. Finance teams don't approve unpredictable usage invoices.

The "we'll just raise prices later" trap

Many AI SaaS in 2023–2024 launched at $20/month to win market share, assuming they'd raise later. By 2026, the ones that survived either raised and churned, or bled cash until they raised and churned. Price correctly at launch: include 20–30% margin buffer for underpriced power users, communicate usage limits clearly, and grandfather only the first 100 customers.

Benchmarks for 2026 AI SaaS pricing

CategoryStarterPro / TeamEnterprise
Horizontal productivity (meeting notes)$0 limited$18-29/seat$40+/seat + SSO
Vertical AI (legal, healthcare, finance)$99-299/seat$500-1500/seatCustom, $10k+/mo
Agentic (coding, support)$0 limited$25-40 + usage$100+ custom
Outcome-based (CX resolution)Per-resolution $0.50-$2Annual commit + resolutionsCustom
Horizontal writing / creative$15-25/seat$30-60/seatCustom

Three worked scenarios: pricing against real API cost

Pricing in the abstract is a trap. Price against the actual API cost of your workload at realistic usage. Here are three products priced from the ground up with April 2026 rate cards.

Scenario 1: Support chatbot resold to SMB clients, 250,000 requests/month per client

A 20-client portfolio at 250k requests/mo each. Per request: 2,350 input + 280 output on Claude Sonnet 4.5 ($3/$15). Uncached cost: 587.5M Γ— $3 + 70M Γ— $15 = $1,762 + $1,050 = $2,812/mo per client. Add Anthropic prompt caching on the 800-token system prefix (90% read discount, ~73% hit rate) and the bill drops to roughly $1,657. Route 65% of FAQ-style intents to Haiku 4 ($0.80/$4) and it lands at about $1,062/mo per client. Price the seat-plus-usage plan at $2,500/mo base with a 300k-request cap, overage $0.005/request. Gross margin at median usage: ($2,500 - $1,062) / $2,500 = 58%. For 20 clients that is $50k/mo revenue, $21k/mo COGS, $29k gross margin.

Scenario 2: RAG pipeline for enterprise knowledge assistant, 50,000 queries/month

Per query: 3,200-token system prompt + 3,900 tokens of retrieved chunks + 120-token user question + 550-token response = 7,220 input + 550 output. Uncached Sonnet 4.5: $1,083 + $413 = $1,496/mo. With 92% cache hit on the 3,200-token system prompt during business hours the input bill drops to around $695, total $1,108/mo. Cohere Rerank 3.5 at $1/1k shaves another $188 and quality typically rises β€” call it $920/mo all-in. Price as a per-seat annual contract at $12 per seat per month with a 1,000-seat floor. At $12k/mo MRR per client with $920 COGS, gross margin is 92% β€” the ceiling case most founders chase.

Scenario 3: Code assistant for a 10-developer engineering team

8,800 queries/month (40/dev/day Γ— 22 workdays Γ— 10 devs). Per query: 5,600 input + 900 output on Sonnet 4.5 = $267/mo total, or $27/dev. Add an explicit Opus 4.1 "hard mode" on <5% of queries and it rises to roughly $320/mo. Price the seat at $40/mo with fair-use caps and an "Opus credits" overage of $0.50 per deep query. For a 10-seat team that is $400/mo revenue against $320/mo COGS β€” 20% gross margin. That is a warning sign: coding assistants are low-margin unless priced at $50+/seat or volume-discounted on the model spend.

Cost levers you must model before quoting a price

  • Anthropic prompt caching gives a 90% discount on cache-read tokens (Sonnet 4.5 drops from $3 to $0.30 per million on reads). On a 1,000-token system prompt at 200k queries per month, that is 200M cached-read tokens per month: $60 with caching vs $600 without. A 10Γ— saving on whatever share of your input repeats across calls.
  • OpenAI automatic prompt caching is a 50% discount on matching prefix tokens over 1,024 in length. GPT-5 drops from $5 to $2.50 per million input on the cached portion. Weaker discount than Anthropic, but it is automatic β€” no code change required.
  • Gemini context caching is a 75% discount on cached tokens with an explicit caching API. Gemini 2.5 Pro drops from $1.25 to roughly $0.31 per million on cached input. Works well for long-context RAG on 1M-token windows.
  • Batch APIs give a flat 50% discount at the cost of up to 24 hours of latency. For nightly enrichment jobs, backfills, or eval runs, this is free money; for interactive SaaS it is irrelevant.
  • Response-length guardrails. Every unnecessary output token is billed at 4–5Γ— the input rate. Capping max_tokensand writing "respond in ≀3 sentences" into the system prompt routinely shaves 20–40% off output cost.

Model selection rules that affect your pricing tier

  • Haiku 4 ($0.80/$4) beats Sonnet 4.5 when the task is narrow, crisp, and high-volume β€” intent classification, PII scrubbing, light extraction. Accepting a 2–3 percentage point quality drop saves 3–4Γ— on cost. On a support-chatbot seat priced at $79, routing 65% of traffic to Haiku is the difference between 28% gross margin and 58%.
  • Sonnet 4.5 ($3/$15) beats Opus 4.1 ($15/$75) for 95% of production workloads. Opus is worth the 5Γ— markup only on gnarly reasoning β€” multi-step legal analysis, architecture-level code review, strategic planning. If you are pricing a Pro plan that implicitly routes everything to Opus, you have a 20-point margin problem.
  • GPT-5 mini ($0.40/$1.60) is real competition to Haiku and is the usual pick when a downstream workflow already depends on OpenAI tools or strict JSON output.
  • Gemini 2.5 Flash ($0.15/$0.60) is the throughput king for bulk jobs β€” 5Γ— cheaper than Haiku on input. Quality varies more by task, so validate on your exact prompts before making it the default.

Production patterns that protect your margin

The patterns that keep gross margin predictable at scale are boring but non-negotiable. Implement a fallback chain (Sonnet 4.5 β†’ GPT-5 β†’ Haiku 4 + simplified prompt β†’ static error response) so a provider outage does not melt the product. Wrap every provider call in a circuit breaker that trips at a 20% error rate over a 2-minute window and cuts over to the secondary for 5 minutes before probing. Give every agent call a hard retry budget (3–5 attempts, absolute token ceiling) so a malformed prompt cannot loop and bill you $80. Enforce a per-tenant monthly token cap and expose usage to the customer via an API β€” both for your unit economics and as a compliance feature B2B buyers genuinely value.

Frequently asked questions

Should I start with usage-based pricing? Only if your buyer is technical and your workload variance is real. For most B2B mid-market, finance teams will not approve unpredictable invoices. Seat + capped usage is the safer default.

How do I pick the credit-to-dollar ratio? Benchmark against the actual token math of a typical task and add 30% buffer. If a meaningful query costs you $0.08 in API spend, price each credit at $0.12 and meter 1 credit per query. The 50% margin buffer absorbs routing mistakes and retries.

What is a realistic gross margin at scale? 65–80% for horizontal productivity SaaS, 55–70% for agentic products, 40–60% for unoptimized code assistants. Below 50% and you are one retention hiccup away from unit-economics collapse.

Should I meter cache-hit tokens back to the customer?No. Billing on a fixed "1 credit = 1 query" basis while pocketing the caching and routing savings is standard. Customers buy outcomes, not token math.

How often should I reprice? Review pricing every 6 months. Model price cuts (Sonnet 4.5 dropped 40% since its initial Sonnet 3.7 predecessor launch) create margin expansion; competitive pressure and feature bloat compress margin. Both motion matters.

What does grandfathering cost in practice? Typically 2–4% ARR dilution per year. Grandfather the first 50–100 customers only; after product-market fit is clear, raise prices and do not retroactively reduce for existing customers.

How do I handle the 2% of customers who cost 30% of API spend? Either cap them hard and churn them intentionally (most common answer), or move them to a committed-spend enterprise contract at a 30–50% rate premium. Do not subsidize them out of other customers' margin.

Does outcome-based pricing ever actually beat seat + usage? Yes, in two cases: when the outcome is perfectly measurable (resolutions, signed contracts, qualified leads) and the buyer is sophisticated enough to trust the metering. Intercom Fin at $0.99/resolution is the archetype; it requires a mature telemetry story to land.

When does a free tier destroy pricing?When free users consume >10% of API cost of paid users and there is no natural conversion trigger. Keep free hard-capped (10 queries/day, Haiku-only) or skip the free tier entirely on usage-heavy products.

Keep going

More free tools