Claude API pricing in 2026 spans three model tiers and four billable token categories. Here is everything you pay for and how to estimate your bill.
The three Claude tiers
Tier
Model
Input ($/M tok)
Output ($/M tok)
Most capable
Claude Opus 4.7
$15
$75
Balanced
Claude Sonnet 4.6
$3
$15
Fastest / cheapest
Claude Haiku 4.5
$1
$5
Four billable token types
Input tokens — your prompt, system message, and any context.
Output tokens — what the model generates. Typically 5× more expensive than input.
Cache write tokens — first request that establishes a cached prompt prefix. ~25% more expensive than input.
Cache read tokens — subsequent reads of the cached prefix. 10% of input price.
Three ways to lower your bill
Prompt caching — cache a long system prompt or document context once, pay 10% on every reuse. Savings compound on RAG-style workloads.
Batch API — submit a JSONL of requests, get results within 24h, pay 50% of standard pricing. Ideal for offline classification, bulk extraction, eval runs.
Model routing — push easy traffic to Haiku, hard traffic to Opus. The Prompt-Pricing Recommender automates this.
Estimate your bill
The Claude Cost Calculator takes your monthly request volume, average input/output token counts, and a model choice, and gives you a monthly USD estimate including cache savings.
Frequently asked questions
What is the cheapest Claude model?
Claude Haiku 4.5 at $1 per million input tokens and $5 per million output tokens. For batch workloads that tolerate 24-hour turnaround, rates drop to $0.50/$2.50.
Does Claude charge for the system prompt on every request?
Without caching, yes — the full system prompt counts as input tokens on every API call. With prompt caching enabled, you pay the higher cache-write rate once, then only 10% of the input rate on subsequent reuses within the 5-minute TTL.
How do I get a volume discount on the Anthropic API?
Volume discounts are negotiated directly with Anthropic sales, typically at $100k+/month spend. Below that threshold, the best available discounts are the Batch API (50% off) and prompt caching (90% off cached reads).