Extended Thinking vs Standard Mode

When to enable Claude's extended-thinking mode vs standard inference. Cost, latency, and quality tradeoffs with concrete examples.

Extended thinking is Claude's mode where the model generates a private chain-of-thought before responding. It produces higher-quality answers on hard problems — and costs more, because the thinking tokens are billed.

How billing works

Thinking tokens are billed at the output rate (e.g., Sonnet 4.6: $15/M, Opus 4.7: $75/M).
You set a maximum thinking budget per request.
The model uses up to that budget; unused tokens aren't billed.
Thinking tokens are not visible to the end user — they exist only in the response object for inspection.

Enable extended thinking when

Multi-step reasoning problems. Math, logic puzzles, debugging from error traces.
Constraint-satisfaction tasks. "Write code that satisfies all of these requirements simultaneously."
Novel-domain analysis. Cases where the model would otherwise pattern-match to something almost-right.

Skip extended thinking when

Templated extraction. The schema constrains the answer; thinking won't add value.
Latency-sensitive chat. Thinking adds 2–10s typically.
Bulk classification. Pay for thinking on 1M docs and the budget will dwarf the actual response cost.

How much does it cost?

A typical 4k-token thinking budget on Sonnet 4.6 adds 4,000 × $15/M = $0.06 per request. On Opus 4.7 the same budget is 4,000 × $75/M = $0.30 per request. For a 100k/month workload, that's $6,000 of pure thinking cost on Sonnet, $30,000 on Opus.

Picking a budget

For most tasks, 2k–8k thinking tokens is enough. Going beyond 16k rarely improves output and burns budget. Start with 4k and tune from there.

Use the Claude Cost Calculator to estimate the monthly impact of enabling extended thinking on your workload.

Frequently asked questions

Are extended-thinking tokens billed?

Yes. Thinking tokens are billed at the output rate of the model in use — $15/M on Sonnet 4.6, $75/M on Opus 4.7. The model uses up to the budget you set; unused tokens aren't billed.

Does extended thinking work with prompt caching?

The cache applies to the input prompt as usual. Thinking tokens themselves are not cached — they are regenerated each request. If your workload reuses the same system prompt, you still get cache savings on input.

What's a sensible extended-thinking budget?

Start with 4,000 tokens for most reasoning tasks. Increase to 8,000–16,000 only for genuinely hard multi-step problems (debugging gnarly bugs, complex math). Beyond 16,000 the marginal quality gain is small for most tasks.

Free tools

Cost Calculator → Prompt-Pricing Recommender → Diff Summarizer → Skills Browser →

Claude Opus 4.7 vs Sonnet 4.6 Pricing (2026 Comparison)How Much Does Claude Cost? (2026 API Pricing Guide)Claude Prompt Caching: 90% Cost Savings Explained (2026)Claude API Cost Calculator: Estimate Your Anthropic Bill Claude vs GPT-4 Pricing: 2026 API Cost Comparison