How to track Claude API spend in production: per-request token logging, daily aggregation, per-user attribution, alert thresholds, and a reference dashboard.
The Anthropic console shows you yesterday's bill. That's not enough. By the time a runaway agent or a chatty prompt shows up in the daily total, it has already cost you four-figures. Real Claude API budget tracking happens in your own infrastructure: every request logged with its token counts, every day rolled up by model and feature, and alerts wired before the spend gets there. This page is the minimal end-to-end pattern.
Every Claude API response includes a usage object with the exact billable tokens. Capture all four fields on every call:
import time, logging
def call_with_logging(client, **kwargs):
start = time.time()
res = client.messages.create(**kwargs)
u = res.usage
logging.info("claude_call", extra={
"model": res.model,
"input_tokens": u.input_tokens,
"output_tokens": u.output_tokens,
"cache_creation_input_tokens": getattr(u, "cache_creation_input_tokens", 0),
"cache_read_input_tokens": getattr(u, "cache_read_input_tokens", 0),
"latency_ms": int((time.time() - start) * 1000),
"feature": kwargs.get("metadata", {}).get("feature"),
"user_id": kwargs.get("metadata", {}).get("user_id"),
"request_id": res.id,
})
return res
The feature and user_id fields are where attribution lives. Set them at the call site so every request is tagged with what it's for and who it's for.
Roll up token counts to USD with a single helper:
PRICING = {
"claude-opus-4-7": {"in": 15, "out": 75, "cw": 18.75, "cr": 1.50},
"claude-sonnet-4-6": {"in": 3, "out": 15, "cw": 3.75, "cr": 0.30},
"claude-haiku-4-5": {"in": 1, "out": 5, "cw": 1.25, "cr": 0.10},
}
def cost_usd(model, in_tok, out_tok, cw_tok=0, cr_tok=0):
p = PRICING[model]
fresh_in = in_tok - cw_tok - cr_tok
return (fresh_in * p["in"]
+ cw_tok * p["cw"]
+ cr_tok * p["cr"]
+ out_tok * p["out"]) / 1_000_000
Compute this per request in a stream processor (or as a derived column in your data warehouse) — don't try to do it at log time.
Three views cover 90% of operational needs:
feature tag. Identifies which product surface is driving cost.Add average tokens-per-call and cache-hit rate as secondary panels — they're the leading indicators that warn before the spend dashboard does.
Three thresholds, all wired to the same channel (PagerDuty or a dedicated Slack):
Pair these with Anthropic console budget alerts at 50/80/100% of your monthly cap — they're the last line of defense, not the first.
For a moderately-sized SaaS running Sonnet 4.6 with caching enabled, expected averages:
| Metric | Typical range |
|---|---|
| Cost per assistant turn | $0.001–$0.005 |
| Cache hit rate | 60–85% |
| Avg input tokens per call | 2k–8k |
| Avg output tokens per call | 150–600 |
| Opus / Sonnet / Haiku mix (by request) | 5% / 70% / 25% |
If your numbers drift materially from these, that's a flag — either you have a real reason (Opus-heavy reasoning workload, long-context Q&A) or you're leaving money on the table.
Pair this page with the production checklist for the broader pre-launch view, and cost optimization for the levers to pull once tracking is in place.