Extended thinking is Claude's mode where the model generates a private chain-of-thought before responding. It produces higher-quality answers on hard problems — and costs more, because the thinking tokens are billed.
How billing works
Thinking tokens are billed at the output rate (e.g., Sonnet 4.6: $15/M, Opus 4.7: $75/M).
You set a maximum thinking budget per request.
The model uses up to that budget; unused tokens aren't billed.
Thinking tokens are not visible to the end user — they exist only in the response object for inspection.
Enable extended thinking when
Multi-step reasoning problems. Math, logic puzzles, debugging from error traces.
Constraint-satisfaction tasks. "Write code that satisfies all of these requirements simultaneously."
Novel-domain analysis. Cases where the model would otherwise pattern-match to something almost-right.
Skip extended thinking when
Templated extraction. The schema constrains the answer; thinking won't add value.
Bulk classification. Pay for thinking on 1M docs and the budget will dwarf the actual response cost.
How much does it cost?
A typical 4k-token thinking budget on Sonnet 4.6 adds 4,000 × $15/M = $0.06 per request. On Opus 4.7 the same budget is 4,000 × $75/M = $0.30 per request. For a 100k/month workload, that's $6,000 of pure thinking cost on Sonnet, $30,000 on Opus.
Picking a budget
For most tasks, 2k–8k thinking tokens is enough. Going beyond 16k rarely improves output and burns budget. Start with 4k and tune from there.
Use the Claude Cost Calculator to estimate the monthly impact of enabling extended thinking on your workload.
Frequently asked questions
Are extended-thinking tokens billed?
Yes. Thinking tokens are billed at the output rate of the model in use — $15/M on Sonnet 4.6, $75/M on Opus 4.7. The model uses up to the budget you set; unused tokens aren't billed.
Does extended thinking work with prompt caching?
The cache applies to the input prompt as usual. Thinking tokens themselves are not cached — they are regenerated each request. If your workload reuses the same system prompt, you still get cache savings on input.
What's a sensible extended-thinking budget?
Start with 4,000 tokens for most reasoning tasks. Increase to 8,000–16,000 only for genuinely hard multi-step problems (debugging gnarly bugs, complex math). Beyond 16,000 the marginal quality gain is small for most tasks.