How do I track AI spend across multiple models?

Capture every API call's input/output token count and model ID, multiply by the current per-model price, sum monthly. The tracker on this page does that for you across the top 8 models — export to CSV at any time.

What is a healthy AI spend per developer in 2026?

$50-150/month per developer for tool subscriptions (Claude Code, Cursor, Copilot) is the median. Direct API spend varies massively by workload — a chatbot can burn $5k/month per 1k DAU; an internal copilot is $50/month per seat.

How do I forecast AI spend for next quarter?

Multiply (current month's tokens) × (expected user growth) × (prompt-cache savings ratio if you ship caching) × (model-price drift factor of ~0.85/year as 2026 prices keep falling).

Does this replace my cloud finance dashboard?

No — it's a planning + audit tool. For real-time spend, use your provider's billing dashboard or a FinOps tool (Vantage, CloudZero). Use this tracker to model expected spend and diagnose.

Why per-workload instead of per-model?

Per-model spend is not actionable. Per-workload spend tells you which use case is broken. 'Our coding assistant costs $1,800 and only 2 people use it' is a different conversation than 'we spent $4k on Anthropic'.

How do I track cache reads?

Anthropic, OpenAI, and Google all return cache read tokens in the usage field of every response. Aggregate monthly and enter the MTok figure.

What's a healthy cost structure?

45-60% on a medium tier (Sonnet 4.5 / GPT-5), 25-35% on cheap tier (Haiku / Flash / mini), 10-20% on premium (Opus / o4). Too much on premium means overpaying; too much on cheap can mean under-delivering quality.

Can I export for finance review?

Yes — the CSV export matches the columns you see. Share with finance or paste into a quarterly review.

AI Spend Tracker — Token Usage & Cost Across Your Models

Track AI spend the way finance tracks cloud

Teams that don't line-item AI spend end up with a single "AI" row on the P&L that grows 40% every quarter and nobody can explain why. This tracker gives you the same unit-economics view finance runs on cloud: one row per workload, one cost, and a total rolling up to the whole stack.

Why per-workload tracking beats per-model

Per-model billing ("we spent $4,200 on Claude last month") tells you nothing actionable. Per-workload billing ("the support chatbot cost $1,200, the coding assistant cost $1,800, and nightly embeddings cost $1,200") lets you ask the right question: which workload's unit economics are broken?

A B2B SaaS we audited had one workload — an internal "research assistant" — eating 40% of total spend. Nobody knew. When surfaced, it turned out three engineers were using it as a free chat interface with no throttling. Fixing the tool took 2 hours. Saved $18k/year.

The columns that matter

Workload: Name, human-readable. "Support chatbot," not "claude-3-2026-01-17-prod-v2."
Model: Which model is serving the workload. Includes price curve.
Input MTok / mo: Total input tokens across the month. Includes system prompts, tool schemas, RAG context.
Output MTok / mo: Total output tokens. Typically 20-40% of input but cost 4-5× more per token.
Cache read MTok: Cached content served at 10% of base price. If this is zero on any workload with a static system prompt, you are overpaying.

Prices used in this tracker (April 2026)

Model	Input $/MTok	Output $/MTok	Cache read $/MTok
Claude Opus 4.7	$15.00	$75.00	$1.50
Claude Sonnet 4.5	$3.00	$15.00	$0.30
Claude Haiku 4	$0.80	$4.00	$0.08
GPT-5	$5.00	$20.00	$0.50
GPT-5 mini	$0.40	$1.60	$0.04
Gemini 3 Pro	$1.25	$10.00	$0.125
Gemini 3 Flash	$0.15	$0.60	$0.015

What to do with the numbers

Sort by cost. The top 3 workloads usually account for 70-80% of spend. That's where to optimize.
Check cache ratio. Any workload with <50% cache-read ratio on a static-prompt architecture is leaving 40-70% on the table.
Check output ratio. Output tokens >40% of input on a non-generation task (classification, extraction, summarization) means the model is being chatty. Tighten max_tokens and add format constraints.
Check model fit. Is Opus running a classification job? That's 15-20× more than a Haiku would cost for the same output quality.
Kill dead workloads. Any line that can't name a measurable output is a candidate for shutoff.

Exporting and sharing

The CSV export matches the columns you see. Share it with finance for forecasting. Share it with engineering as a target: cut this workload's cost 30% in 30 days. Put it in the monthly AI review.

What this tracker doesn't cover

Infrastructure around the model (vector DBs, monitoring, orchestration), team time, and opportunity cost. For a full AI stack cost, combine this tracker with the AI Tool Stack Cost calculator. For ROI, use AI ROI. For SaaS pricing, use AI SaaS Pricing.

Keep going

AI Tool Stack Cost — Add up every AI subscription in your stack (SaaS + API).
Prompt Cache Savings — Calculate the cache savings per workload.
LLM API Cost Calculator — Model a single workload's cost from tokens.
AI ROI Calculator — Once you know cost, measure the value coming out.

Use the data programmatically

Every calculator on this site is also exposed as a free, CORS-open JSON endpoint. No auth, no rate limit (fair-use, please cache). License is CC-BY-4.0 — link back to attribution.canonicalUrl in the response.

Endpoint: https://aieconomyhub.co/api/page/ai-spend-tracker

curl

curl -s 'https://aieconomyhub.co/api/page/ai-spend-tracker' | jq .

Python

import requests

r = requests.get("https://aieconomyhub.co/api/page/ai-spend-tracker", timeout=10)
r.raise_for_status()
data = r.json()
print(data["title"])
for faq in data.get("faqs", []):
    print("Q:", faq["q"])

JavaScript / Node

// Node 20+ / modern browser
const res = await fetch("https://aieconomyhub.co/api/page/ai-spend-tracker");
if (!res.ok) throw new Error("HTTP " + res.status);
const ai_spend_tracker = await res.json();
console.log(ai_spend_tracker.title);
for (const faq of ai_spend_tracker.faqs ?? []) {
  console.log("Q:", faq.q);
}

Spec: /api/openapi.yaml · Docs: /api/docs

AI spend tracker

Frequently asked questions

Stop writing AI prompts from scratch.