Prompt caching is the single largest cost lever in the Claude API. Cached reads cost 10% of input pricing — a 90% discount that compounds across reuses.
What you can cache
System prompt (most common — long instructions, examples, persona)
By default, a cached prefix lives for 5 minutes after the most recent read. Each read refreshes the TTL. If you have steady traffic above one request every 5 minutes per prefix, the cache stays hot.
1-hour cache TTL is available at higher write cost — useful for low-traffic but expensive prefixes.
The break-even point
Cache write costs ~25% more than a fresh input token. Cache read costs 10%. So caching breaks even at ~2 reuses within the TTL window. After that, every additional reuse is pure savings.
Reuses within TTL
Effective cost vs no-cache
1
+125% (worse)
2
~67% (break-even crossed)
5
~35%
20
~16%
100
~11%
How to actually use it
In the messages API, add cache_control: { "type": "ephemeral" } to a content block. Everything before that breakpoint in the prompt gets cached. You can place up to 4 breakpoints.
Estimate your savings
The Claude Cost Calculator has a cache-utilization slider — set the fraction of input tokens that are cached and see the monthly impact.
Frequently asked questions
How long does Claude's prompt cache last?
The default TTL is 5 minutes, refreshed on each read. A 1-hour TTL is also available at a higher cache-write cost — useful when traffic is lower than one request per 5 minutes but the prefix is large and expensive.
How many cache breakpoints can I use?
Up to 4 cache breakpoints per API call. Each breakpoint marks the end of a cacheable prefix. Most teams only need 1–2: one for the system prompt and optionally one for a document context block.
Does caching work with the Batch API?
Yes — prompt caching and the Batch API stack. A cached prefix on a batch request still gets the 10% cache-read rate, and the batch 50% discount applies on top. Effective rate on cached prefixes in batch: 5% of standard input pricing.