Claude API Conversation History

How to implement multi-turn conversation history with the Claude API. Maintain context, manage token limits, and avoid common mistakes in Python and JavaScript.

🔥 Launch tonight — Power Prompts PDF 50p (just 50p tonight)30 battle-tested Claude Code prompts · 8 pages · paste into CLAUDE.md · price reverts to £5

The Claude API is stateless — it does not remember previous messages automatically. To build a multi-turn chat, you must send the full conversation history with every request. This page shows you how to do it correctly in Python and JavaScript.

How conversation history works

Each call to messages.create receives a messages array of alternating user and assistant turns. Claude sees the entire array and responds as if continuing the conversation.

import anthropic

client = anthropic.Anthropic()

history = []

def chat(user_input: str) -> str:
    history.append({"role": "user", "content": user_input})
    response = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=1024,
        messages=history
    )
    reply = response.content[0].text
    history.append({"role": "assistant", "content": reply})
    return reply

print(chat("What is the capital of France?"))
print(chat("What is its population?"))  # Claude knows "its" = Paris

JavaScript / TypeScript version

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic();
const history: Anthropic.MessageParam[] = [];

async function chat(userInput: string): Promise {
    history.push({ role: "user", content: userInput });
    const response = await client.messages.create({
        model: "claude-sonnet-4-6",
        max_tokens: 1024,
        messages: history
    });
    const reply = (response.content[0] as Anthropic.TextBlock).text;
    history.push({ role: "assistant", content: reply });
    return reply;
}

await chat("Name a popular Python framework.");
await chat("What is it used for?"); // context maintained

With a system prompt

Pass the system prompt as the top-level system parameter — not as the first message. The messages array should only contain user and assistant turns.

response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    system="You are a concise coding assistant. Reply in under 3 sentences.",
    messages=history
)

Managing token limits

Each new turn sends the full history. As conversations grow, you'll hit Claude's 200k context window or pay for tokens you don't need. Three strategies:

StrategyHowTrade-off
Sliding windowKeep only the last N turns: messages[-20:]Cheapest; older context lost
Summarise old turnsAsk Claude to compress old history into a single summary messageContext preserved; one extra API call
Prompt cachingMark a stable prefix (e.g. a long document context) with cache_control90% cost reduction on the cached portion

Sliding window example

MAX_TURNS = 20

def chat_windowed(user_input: str) -> str:
    history.append({"role": "user", "content": user_input})
    response = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=1024,
        messages=history[-MAX_TURNS:]  # keep last 20 turns only
    )
    reply = response.content[0].text
    history.append({"role": "assistant", "content": reply})
    return reply

Common mistakes

Estimate the token cost of your conversation history with the Claude Cost Calculator. For high-volume chat workloads, the Prompt-Pricing Recommender helps choose the right tier.

Frequently asked questions

Does Claude remember previous conversations automatically?
No. The Claude API is stateless — there are no server-side sessions. You must send the full conversation history in the messages array with every request. In-app memory (summarisation, retrieval) is built on top of this stateless API.
What is the maximum conversation length in Claude?
Claude's context window is 200,000 tokens. A typical back-and-forth exchange of 50 turns at 500 tokens per turn uses 25,000 tokens — well within limits. For very long sessions, use a sliding window or summarise old turns to stay under budget.
Can I mix text and images in Claude conversation history?
Yes. Each message's content field can be a string (text only) or an array of content blocks mixing TextBlock and ImageBlock. Both image types (base64 and URL) are supported in history. Note that cached image blocks are especially cost-effective in multi-turn vision workflows.
How do I persist conversation history between server restarts?
Serialize the history array to JSON and store it in a database (Redis, Postgres, or any key-value store) keyed by session ID. On each request, load the history, append the new user message, call the API, append the assistant reply, and save the updated history back.
Is sending the full history every request inefficient?
Only slightly. Anthropic's prompt caching feature reduces the effective cost of re-sending the same earlier turns. With cache_control markers, stable portions of your history (system prompt, loaded documents) cost 10% of the normal input price after the first request. The variable recent turns are always billed at full price.

Free tools

Cost Calculator → Prompt-Pricing Recommender → Diff Summarizer → Skills Browser →

Related

Claude Opus 4.7 vs Sonnet 4.6 Pricing (2026 Comparison)How Much Does Claude Cost? (2026 API Pricing Guide)Claude Prompt Caching: 90% Cost Savings Explained (2026)Claude API Cost Calculator: Estimate Your Anthropic BillClaude vs GPT-4 Pricing: 2026 API Cost Comparison