Claude API Budget Tracking

How to track Claude API spend in production: per-request token logging, daily aggregation, per-user attribution, alert thresholds, and a reference dashboard.

🔥 Launch tonight — Power Prompts PDF 50p (just 50p tonight)30 battle-tested Claude Code prompts · 8 pages · paste into CLAUDE.md · price reverts to £5

The Anthropic console shows you yesterday's bill. That's not enough. By the time a runaway agent or a chatty prompt shows up in the daily total, it has already cost you four-figures. Real Claude API budget tracking happens in your own infrastructure: every request logged with its token counts, every day rolled up by model and feature, and alerts wired before the spend gets there. This page is the minimal end-to-end pattern.

Step 1: log usage on every request

Every Claude API response includes a usage object with the exact billable tokens. Capture all four fields on every call:

import time, logging

def call_with_logging(client, **kwargs):
    start = time.time()
    res = client.messages.create(**kwargs)
    u = res.usage
    logging.info("claude_call", extra={
        "model": res.model,
        "input_tokens": u.input_tokens,
        "output_tokens": u.output_tokens,
        "cache_creation_input_tokens": getattr(u, "cache_creation_input_tokens", 0),
        "cache_read_input_tokens": getattr(u, "cache_read_input_tokens", 0),
        "latency_ms": int((time.time() - start) * 1000),
        "feature": kwargs.get("metadata", {}).get("feature"),
        "user_id": kwargs.get("metadata", {}).get("user_id"),
        "request_id": res.id,
    })
    return res

The feature and user_id fields are where attribution lives. Set them at the call site so every request is tagged with what it's for and who it's for.

Step 2: convert tokens to dollars

Roll up token counts to USD with a single helper:

PRICING = {
    "claude-opus-4-7":   {"in": 15, "out": 75, "cw": 18.75, "cr": 1.50},
    "claude-sonnet-4-6": {"in":  3, "out": 15, "cw":  3.75, "cr": 0.30},
    "claude-haiku-4-5":  {"in":  1, "out":  5, "cw":  1.25, "cr": 0.10},
}

def cost_usd(model, in_tok, out_tok, cw_tok=0, cr_tok=0):
    p = PRICING[model]
    fresh_in = in_tok - cw_tok - cr_tok
    return (fresh_in * p["in"]
            + cw_tok * p["cw"]
            + cr_tok * p["cr"]
            + out_tok * p["out"]) / 1_000_000

Compute this per request in a stream processor (or as a derived column in your data warehouse) — don't try to do it at log time.

Step 3: dashboards that matter

Three views cover 90% of operational needs:

  1. Spend per day by model. Stacked bar. Catches model-mix shifts (e.g. a fallback path that started routing everything to Opus).
  2. Spend per feature. Per the feature tag. Identifies which product surface is driving cost.
  3. Top users by spend. The bottom-90/top-10 split. There is almost always a single power user (or an automation script) responsible for 20–40% of daily cost.

Add average tokens-per-call and cache-hit rate as secondary panels — they're the leading indicators that warn before the spend dashboard does.

Step 4: alerts

Three thresholds, all wired to the same channel (PagerDuty or a dedicated Slack):

Pair these with Anthropic console budget alerts at 50/80/100% of your monthly cap — they're the last line of defense, not the first.

What the numbers look like

For a moderately-sized SaaS running Sonnet 4.6 with caching enabled, expected averages:

MetricTypical range
Cost per assistant turn$0.001–$0.005
Cache hit rate60–85%
Avg input tokens per call2k–8k
Avg output tokens per call150–600
Opus / Sonnet / Haiku mix (by request)5% / 70% / 25%

If your numbers drift materially from these, that's a flag — either you have a real reason (Opus-heavy reasoning workload, long-context Q&A) or you're leaving money on the table.

Open-source tooling

Pair this page with the production checklist for the broader pre-launch view, and cost optimization for the levers to pull once tracking is in place.

Frequently asked questions

How do I track Claude API spend in real time?
Log every request's usage object (input_tokens, output_tokens, cache_creation_input_tokens, cache_read_input_tokens) to your observability stack, convert to USD per response with a per-model pricing table, and roll up by minute/hour. The Anthropic console lags by hours; in-app logging is the only way to alert in minutes.
What is a normal Claude API cost per request?
For Sonnet 4.6 with prompt caching enabled, a typical assistant turn (2k input + 500 output) costs $0.001–$0.005 depending on cache hit rate and output length. Opus 4.7 is 5× more. Haiku 4.5 is about 3× cheaper than Sonnet. If your average cost per turn is materially above these ranges, run the cost-tracking pattern in this page to find the outlier.
Can I see Claude API usage per user?
Anthropic doesn't break out usage by your end-users — the console attributes spend to API keys, not your application's users. Per-user attribution requires logging a user_id with each call into your own observability store, then rolling up there. This is the only way to identify power users or runaway automation.
What Claude API spend alerts should I set?
Three: daily spend > 1.5× rolling 7-day average (sudden surges), single user > $X in 1 hour (runaway scripts), and cache-hit rate dropping > 30% below baseline (accidentally broken caching). Plus the Anthropic console budget caps at 50/80/100% of monthly limit as a backstop.

Free tools

Cost Calculator → Prompt-Pricing Recommender → Diff Summarizer → Skills Browser →

Related

Claude Opus 4.7 vs Sonnet 4.6 Pricing (2026 Comparison)How Much Does Claude Cost? (2026 API Pricing Guide)Claude Prompt Caching: 90% Cost Savings Explained (2026)Claude API Cost Calculator: Estimate Your Anthropic BillClaude vs GPT-4 Pricing: 2026 API Cost Comparison