Complete guide to Anthropic's Claude computer use API in 2026. Learn how to pass screenshots, handle tool calls, and build desktop/web automation pipelines with Claude.
Claude computer use lets the model observe a desktop screenshot and emit tool calls that your code executes — click, type, scroll, screenshot — in a loop until the task is done. It ships as three built-in tools in the Messages API: computer, text_editor, and bash.
tool_result image block.computer(action="screenshot"), computer(action="left_click", coordinate=[x,y]), etc.Pass computer-use-2024-10-22 as a beta header and include the built-in tools in your request:
import anthropic
client = anthropic.Anthropic()
tools = [
{
"type": "computer_20241022",
"name": "computer",
"display_width_px": 1280,
"display_height_px": 800,
"display_number": 1
},
{
"type": "text_editor_20241022",
"name": "str_replace_editor"
},
{
"type": "bash_20241022",
"name": "bash"
}
]
response = client.beta.messages.create(
model="claude-sonnet-4-6",
max_tokens=4096,
tools=tools,
messages=[
{
"role": "user",
"content": "Open a browser, go to example.com, and take a screenshot."
}
],
betas=["computer-use-2024-10-22"]
)
import subprocess, base64, anthropic
def take_screenshot():
result = subprocess.run(["import", "-window", "root", "png:-"], capture_output=True)
return base64.standard_b64encode(result.stdout).decode("utf-8")
def handle_tool_call(tool_name, tool_input):
if tool_name == "computer":
action = tool_input["action"]
if action == "screenshot":
return take_screenshot()
elif action == "left_click":
x, y = tool_input["coordinate"]
subprocess.run(["xdotool", "mousemove", str(x), str(y), "click", "1"])
return take_screenshot()
elif action == "type":
subprocess.run(["xdotool", "type", "--", tool_input["text"]])
return take_screenshot()
return ""
messages = [{"role": "user", "content": "Your task here..."}]
while True:
response = client.beta.messages.create(
model="claude-sonnet-4-6",
max_tokens=4096,
tools=tools,
messages=messages,
betas=["computer-use-2024-10-22"]
)
if response.stop_reason == "end_turn":
break
tool_results = []
for block in response.content:
if block.type == "tool_use":
result = handle_tool_call(block.name, block.input)
tool_results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": [{"type": "image", "source": {"type": "base64", "media_type": "image/png", "data": result}}]
})
messages.append({"role": "assistant", "content": response.content})
messages.append({"role": "user", "content": tool_results})
| Action | Parameters | Effect |
|---|---|---|
| screenshot | none | Returns current screen as base64 PNG |
| left_click | coordinate [x, y] | Mouse click at pixel coordinates |
| right_click | coordinate [x, y] | Right-click (context menu) |
| double_click | coordinate [x, y] | Double-click |
| type | text (string) | Type characters at current focus |
| key | text (e.g. "ctrl+c") | Keyboard shortcut |
| scroll | coordinate, direction, amount | Mouse wheel scroll |
| mouse_move | coordinate [x, y] | Move cursor without clicking |
| left_click_drag | start_coordinate, coordinate | Click-and-drag |
| cursor_position | none | Returns current [x, y] cursor position |
Computer use works with Claude Sonnet 4.6 and Claude Opus 4.7. Sonnet 4.6 is the recommended default — it's significantly faster and 5× cheaper, which matters in agentic loops that can take 20–50 API calls per task. Use Opus only for tasks requiring complex multi-step reasoning over ambiguous UIs. Estimate cost with the Claude Cost Calculator.
max_tokens and iteration limits to cap runaway loops.A typical 20-step computer use task (screenshot every step) uses roughly 50k input tokens (mostly image data) and 1k output tokens. At Sonnet 4.6 rates: ~$0.165 per task. Complex 50-step tasks: ~$0.40. Estimate your workload with the Cost Calculator.