How to implement multi-turn conversation history with the Claude API. Maintain context, manage token limits, and avoid common mistakes in Python and JavaScript.
The Claude API is stateless — it does not remember previous messages automatically. To build a multi-turn chat, you must send the full conversation history with every request. This page shows you how to do it correctly in Python and JavaScript.
Each call to messages.create receives a messages array of alternating user and assistant turns. Claude sees the entire array and responds as if continuing the conversation.
import anthropic
client = anthropic.Anthropic()
history = []
def chat(user_input: str) -> str:
history.append({"role": "user", "content": user_input})
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
messages=history
)
reply = response.content[0].text
history.append({"role": "assistant", "content": reply})
return reply
print(chat("What is the capital of France?"))
print(chat("What is its population?")) # Claude knows "its" = Paris
import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic();
const history: Anthropic.MessageParam[] = [];
async function chat(userInput: string): Promise {
history.push({ role: "user", content: userInput });
const response = await client.messages.create({
model: "claude-sonnet-4-6",
max_tokens: 1024,
messages: history
});
const reply = (response.content[0] as Anthropic.TextBlock).text;
history.push({ role: "assistant", content: reply });
return reply;
}
await chat("Name a popular Python framework.");
await chat("What is it used for?"); // context maintained
Pass the system prompt as the top-level system parameter — not as the first message. The messages array should only contain user and assistant turns.
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
system="You are a concise coding assistant. Reply in under 3 sentences.",
messages=history
)
Each new turn sends the full history. As conversations grow, you'll hit Claude's 200k context window or pay for tokens you don't need. Three strategies:
| Strategy | How | Trade-off |
|---|---|---|
| Sliding window | Keep only the last N turns: messages[-20:] | Cheapest; older context lost |
| Summarise old turns | Ask Claude to compress old history into a single summary message | Context preserved; one extra API call |
| Prompt caching | Mark a stable prefix (e.g. a long document context) with cache_control | 90% cost reduction on the cached portion |
MAX_TURNS = 20
def chat_windowed(user_input: str) -> str:
history.append({"role": "user", "content": user_input})
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
messages=history[-MAX_TURNS:] # keep last 20 turns only
)
reply = response.content[0].text
history.append({"role": "assistant", "content": reply})
return reply
messages must be a user turn. Claude will return a 400 error if you lead with assistant.system param is separate. Adding it as a user message wastes tokens and confuses the model.history.append(assistant_turn) resets context on every message.Estimate the token cost of your conversation history with the Claude Cost Calculator. For high-volume chat workloads, the Prompt-Pricing Recommender helps choose the right tier.