Lesson 14: Extended Thinking & Adaptive Reasoning

What Is Extended Thinking?

Extended thinking is Claude's ability to reason through a problem step-by-step before producing a final answer — thinking out loud in a scratchpad that you can optionally observe. It's the difference between asking Claude to answer immediately versus giving it time to work through the problem first.

When extended thinking is enabled, Claude generates a hidden chain of reasoning tagged as "thinking" content. The final response is informed by this reasoning, but the thinking itself can be streamed, displayed, or discarded depending on your needs.

Note: Extended thinking is an API-level feature. It's not available in every interface, and it incurs additional token costs for the thinking tokens.

When Extended Thinking Helps

Extended thinking improves performance on problems that have these characteristics:

Complex multi-step reasoning Problems where the solution requires multiple logical steps that each depend on the previous one. Mathematical proofs, algorithm design, step-by-step debugging.

Ambiguous or underspecified problems When there are multiple valid interpretations of a problem, extended thinking lets Claude explore them and choose the most defensible path before committing.

Tradeoff analysis Architecture decisions, performance vs. maintainability choices, security vs. usability tradeoffs — problems where the answer depends on careful weighing of competing factors.

Code debugging with subtle causes Bugs caused by timing issues, state mutation, incorrect assumptions about library behavior — problems where jumping to an answer often produces wrong hypotheses.

How to Enable It

Enable extended thinking by setting the thinking parameter in your API call. You set a budget_tokens value — the maximum number of tokens Claude can spend on internal reasoning.

import anthropic

client = anthropic.Anthropic()

response = client.messages.create(
    model="claude-opus-4-5",  # Extended thinking works best on Opus
    max_tokens=16000,
    thinking={
        "type": "enabled",
        "budget_tokens": 10000  # Claude can use up to 10K tokens to think
    },
    messages=[{
        "role": "user",
        "content": "Prove that there are infinitely many prime numbers."
    }]
)

# Response contains both thinking and text blocks
for block in response.content:
    if block.type == "thinking":
        print("REASONING:", block.thinking)
    elif block.type == "text":
        print("ANSWER:", block.text)

Streaming Thinking Output

For interactive applications, you can stream thinking tokens as they're generated:

with client.messages.stream(
    model="claude-opus-4-5",
    max_tokens=16000,
    thinking={"type": "enabled", "budget_tokens": 8000},
    messages=[{"role": "user", "content": "Debug this race condition: ..."}]
) as stream:
    for event in stream:
        if hasattr(event, 'type'):
            if event.type == 'content_block_start':
                if event.content_block.type == 'thinking':
                    print("\n[Thinking...]")
            elif event.type == 'content_block_delta':
                if event.delta.type == 'thinking_delta':
                    print(event.delta.thinking, end='', flush=True)
                elif event.delta.type == 'text_delta':
                    print(event.delta.text, end='', flush=True)

Budget Tokens: How Much to Allocate

The budget_tokens value controls how much Claude can spend on reasoning. More isn't always better.

Problem Complexity	Suggested Budget
Simple multi-step math	1,000–2,000
Moderate reasoning / debugging	4,000–8,000
Complex architecture / proofs	10,000–16,000
Research synthesis / hard algorithms	20,000+

Claude won't always use the full budget — it only thinks as much as the problem warrants. Setting a high ceiling doesn't force unnecessary computation.

Cost Implications

Thinking tokens are billed as input tokens. If Claude thinks for 8,000 tokens and produces 1,000 tokens of final answer, you're billed for 8,000 + 1,000 output tokens plus your input tokens.

Cost example (Claude Opus, approximate):
- Input: 500 tokens  × $15/M  = $0.0075
- Thinking: 8,000 tokens × $15/M = $0.12
- Output: 1,000 tokens × $75/M = $0.075
Total: ~$0.20 per call

Compare to a non-thinking call:

- Input: 500 tokens × $15/M = $0.0075
- Output: 1,000 tokens × $75/M = $0.075
Total: ~$0.08 per call

Extended thinking roughly doubles to triples the cost on Opus. Use it when the quality improvement is worth it, not by default.

Practical Examples

Debugging a Race Condition

messages = [{
    "role": "user",
    "content": """
    This Node.js code occasionally returns stale data. It uses a cache with a 
    60-second TTL. Under load (>100 req/s), about 2% of requests return 
    data from the previous cache entry even after a forced refresh.
    
    [code snippet]
    
    Walk through the possible failure modes and identify the root cause.
    """
}]

response = client.messages.create(
    model="claude-opus-4-5",
    max_tokens=8000,
    thinking={"type": "enabled", "budget_tokens": 12000},
    messages=messages
)

The thinking trace will walk through race conditions, locking issues, and event loop behavior in detail before the final answer commits to the most likely cause.

Complex Refactoring Plan

For any refactoring that spans multiple files and has order-of-operations constraints, extended thinking helps Claude reason about dependency order, migration paths, and rollback strategies before generating the plan.

When NOT to Use Extended Thinking

Simple factual lookups: No benefit for "what's the syntax for X?"
Boilerplate generation: Code templates, CRUD scaffolding, trivial transformations
Latency-sensitive applications: Thinking adds perceptible delay
High-volume batch processing: Thinking tokens make large-scale jobs expensive

Key Takeaways

Extended thinking gives Claude a reasoning scratchpad before it commits to an answer
Enable it with the thinking parameter and a budget_tokens budget
Best used with Claude Opus on genuinely complex or ambiguous problems
Thinking tokens cost money — same rate as input tokens on Opus
You can stream and display thinking for transparency in your application
Don't enable it by default; reserve it for problems where the quality improvement justifies the cost