Claude Prompt Caching, Explained

How Claude's prompt caching works in 2026: what gets cached, the 5-minute TTL, cost math, and when caching breaks even.

🔥 Launch tonight — Power Prompts PDF 50p (just 50p tonight)30 battle-tested Claude Code prompts · 8 pages · paste into CLAUDE.md · price reverts to £5

Prompt caching is the single largest cost lever in the Claude API. Cached reads cost 10% of input pricing — a 90% discount that compounds across reuses.

What you can cache

The 5-minute rule

By default, a cached prefix lives for 5 minutes after the most recent read. Each read refreshes the TTL. If you have steady traffic above one request every 5 minutes per prefix, the cache stays hot.

1-hour cache TTL is available at higher write cost — useful for low-traffic but expensive prefixes.

The break-even point

Cache write costs ~25% more than a fresh input token. Cache read costs 10%. So caching breaks even at ~2 reuses within the TTL window. After that, every additional reuse is pure savings.

Reuses within TTLEffective cost vs no-cache
1+125% (worse)
2~67% (break-even crossed)
5~35%
20~16%
100~11%

How to actually use it

In the messages API, add cache_control: { "type": "ephemeral" } to a content block. Everything before that breakpoint in the prompt gets cached. You can place up to 4 breakpoints.

Estimate your savings

The Claude Cost Calculator has a cache-utilization slider — set the fraction of input tokens that are cached and see the monthly impact.

Frequently asked questions

How long does Claude's prompt cache last?
The default TTL is 5 minutes, refreshed on each read. A 1-hour TTL is also available at a higher cache-write cost — useful when traffic is lower than one request per 5 minutes but the prefix is large and expensive.
How many cache breakpoints can I use?
Up to 4 cache breakpoints per API call. Each breakpoint marks the end of a cacheable prefix. Most teams only need 1–2: one for the system prompt and optionally one for a document context block.
Does caching work with the Batch API?
Yes — prompt caching and the Batch API stack. A cached prefix on a batch request still gets the 10% cache-read rate, and the batch 50% discount applies on top. Effective rate on cached prefixes in batch: 5% of standard input pricing.

Free tools

Cost Calculator → Prompt-Pricing Recommender → Diff Summarizer → Skills Browser →

Related

Claude Opus 4.7 vs Sonnet 4.6 Pricing (2026 Comparison)How Much Does Claude Cost? (2026 API Pricing Guide)Claude API Cost Calculator: Estimate Your Anthropic BillClaude vs GPT-4 Pricing: 2026 API Cost ComparisonAnthropic API Pricing 2026: Full Cost Reference