Track AI spend the way finance tracks cloud
Teams that don't line-item AI spend end up with a single "AI" row on the P&L that grows 40% every quarter and nobody can explain why. This tracker gives you the same unit-economics view finance runs on cloud: one row per workload, one cost, and a total rolling up to the whole stack.
Why per-workload tracking beats per-model
Per-model billing ("we spent $4,200 on Claude last month") tells you nothing actionable. Per-workload billing ("the support chatbot cost $1,200, the coding assistant cost $1,800, and nightly embeddings cost $1,200") lets you ask the right question: which workload's unit economics are broken?
A B2B SaaS we audited had one workload — an internal "research assistant" — eating 40% of total spend. Nobody knew. When surfaced, it turned out three engineers were using it as a free chat interface with no throttling. Fixing the tool took 2 hours. Saved $18k/year.
The columns that matter
- Workload: Name, human-readable. "Support chatbot," not "claude-3-2026-01-17-prod-v2."
- Model: Which model is serving the workload. Includes price curve.
- Input MTok / mo: Total input tokens across the month. Includes system prompts, tool schemas, RAG context.
- Output MTok / mo: Total output tokens. Typically 20-40% of input but cost 4-5× more per token.
- Cache read MTok: Cached content served at 10% of base price. If this is zero on any workload with a static system prompt, you are overpaying.
Prices used in this tracker (April 2026)
| Model | Input $/MTok | Output $/MTok | Cache read $/MTok |
|---|---|---|---|
| Claude Opus 4.7 | $15.00 | $75.00 | $1.50 |
| Claude Sonnet 4.5 | $3.00 | $15.00 | $0.30 |
| Claude Haiku 4 | $0.80 | $4.00 | $0.08 |
| GPT-5 | $5.00 | $20.00 | $0.50 |
| GPT-5 mini | $0.40 | $1.60 | $0.04 |
| Gemini 3 Pro | $1.25 | $10.00 | $0.125 |
| Gemini 3 Flash | $0.15 | $0.60 | $0.015 |
What to do with the numbers
- Sort by cost. The top 3 workloads usually account for 70-80% of spend. That's where to optimize.
- Check cache ratio. Any workload with <50% cache-read ratio on a static-prompt architecture is leaving 40-70% on the table.
- Check output ratio. Output tokens >40% of input on a non-generation task (classification, extraction, summarization) means the model is being chatty. Tighten max_tokens and add format constraints.
- Check model fit. Is Opus running a classification job? That's 15-20× more than a Haiku would cost for the same output quality.
- Kill dead workloads. Any line that can't name a measurable output is a candidate for shutoff.
Exporting and sharing
The CSV export matches the columns you see. Share it with finance for forecasting. Share it with engineering as a target: cut this workload's cost 30% in 30 days. Put it in the monthly AI review.
What this tracker doesn't cover
Infrastructure around the model (vector DBs, monitoring, orchestration), team time, and opportunity cost. For a full AI stack cost, combine this tracker with the AI Tool Stack Cost calculator. For ROI, use AI ROI. For SaaS pricing, use AI SaaS Pricing.
- AI Tool Stack Cost — Add up every AI subscription in your stack (SaaS + API).
- Prompt Cache Savings — Calculate the cache savings per workload.
- LLM API Cost Calculator — Model a single workload's cost from tokens.
- AI ROI Calculator — Once you know cost, measure the value coming out.