Anthropic prices the Claude API in dollars per million tokens. This page unpacks what that means in practice and shows a worked example for a typical chat assistant.
What is a token?
A token is roughly 4 characters of English text, or about ¾ of a word. "Hello, world!" is roughly 3 tokens. The full text of Pride and Prejudice is about 165,000 tokens.
The four billable token types
Input tokens — your prompt sent to Claude.
Output tokens — what Claude generates. Always priced at 5× the input rate.
Cache write tokens — the first request that establishes a cached prefix. ~25% more than input.
Cache read tokens — subsequent reads of the cached prefix. 10% of input price.
2026 Anthropic rates per million tokens
Model
Input
Output
Cached read
Opus 4.7
$15.00
$75.00
$1.50
Sonnet 4.6
$3.00
$15.00
$0.30
Haiku 4.5
$1.00
$5.00
$0.10
Worked example — 100k requests on Sonnet 4.6
Assume a chat workload: 2,000 input tokens per request, 400 output tokens. Across 100,000 requests:
Plug your numbers into the Claude Cost Calculator for a precise estimate including cache savings.
Frequently asked questions
Why is Claude output 5× more expensive than input?
Output generation is more computationally expensive than input processing — every output token requires a full forward pass through the model, while input tokens are processed in parallel during prompt-encoding. The 5× ratio is consistent across all three Claude tiers.
How many tokens is a typical English word?
About 1.3 tokens per English word on average. 1,000 tokens is roughly 750 words. For non-English text, ratios vary — code is often denser; logographic languages like Chinese can be 1–2 tokens per character.
Do system prompts count as input tokens?
Yes. System prompts are billed at the input rate just like user messages. This is why caching long system prompts saves so much — they're billed at 10% of input price on cached reads.