Extended thinking lets Claude reason through a problem before answering. The reasoning trace is real output, and Anthropic bills it as output tokens.
How it's billed
Thinking tokens count as output tokens at the model's standard rate ($75/M on Opus, $15/M on Sonnet).
You set a budget_tokens cap on the thinking phase to bound cost.
The final user-visible answer is billed on top of the thinking tokens.
The cost math
A typical hard reasoning task uses 3–10k thinking tokens. On Sonnet 4.6, that's $0.045 – $0.15 per request just for the reasoning. On Opus 4.7, $0.225 – $0.75. Plan accordingly.
When it's worth it
Math and logic problems where chain-of-thought meaningfully helps.
Code review where the model needs to trace through control flow.
When it isn't
Single-shot extraction or classification — thinking adds latency and cost with no quality lift.
Creative writing — the model overthinks.
Estimate
The Cost Calculator has a "thinking tokens per request" field — add an estimate to see total impact.
Frequently asked questions
How much do Claude thinking tokens cost?
Thinking tokens are billed as output tokens. On Sonnet 4.6 they cost $15 per million; on Opus 4.7 they cost $75 per million. A typical hard reasoning task uses 3,000–10,000 thinking tokens, costing $0.045–$0.15 on Sonnet and $0.225–$0.75 on Opus per request.
Can I cap the number of thinking tokens Claude uses?
Yes — the budget_tokens parameter sets a hard upper limit on thinking tokens per request. Setting it to 2,000 for simpler tasks and 8,000 for harder ones is a common pattern that keeps extended thinking cost predictable.
Is extended thinking available via the Batch API?
Yes. Extended thinking works with both the standard Messages API and the Batch API. Thinking tokens are still billed as output tokens; the 50% batch discount applies to them as well.