Claude Computer Use API: Automate Desktop and Web Tasks

Complete guide to Anthropic's Claude computer use API in 2026. Learn how to pass screenshots, handle tool calls, and build desktop/web automation pipelines with Claude.

🔥 Launch tonight — Power Prompts PDF 50p (just 50p tonight)30 battle-tested Claude Code prompts · 8 pages · paste into CLAUDE.md · price reverts to £5

Claude computer use lets the model observe a desktop screenshot and emit tool calls that your code executes — click, type, scroll, screenshot — in a loop until the task is done. It ships as three built-in tools in the Messages API: computer, text_editor, and bash.

How it works

  1. You take a screenshot of the desktop (or a browser viewport).
  2. You send it to the API as a tool_result image block.
  3. Claude responds with a tool call: computer(action="screenshot"), computer(action="left_click", coordinate=[x,y]), etc.
  4. Your code executes the action, captures the result (a new screenshot or stdout), and loops.

Enabling the tools

Pass computer-use-2024-10-22 as a beta header and include the built-in tools in your request:

import anthropic

client = anthropic.Anthropic()

tools = [
    {
        "type": "computer_20241022",
        "name": "computer",
        "display_width_px": 1280,
        "display_height_px": 800,
        "display_number": 1
    },
    {
        "type": "text_editor_20241022",
        "name": "str_replace_editor"
    },
    {
        "type": "bash_20241022",
        "name": "bash"
    }
]

response = client.beta.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=4096,
    tools=tools,
    messages=[
        {
            "role": "user",
            "content": "Open a browser, go to example.com, and take a screenshot."
        }
    ],
    betas=["computer-use-2024-10-22"]
)

The agentic loop

import subprocess, base64, anthropic

def take_screenshot():
    result = subprocess.run(["import", "-window", "root", "png:-"], capture_output=True)
    return base64.standard_b64encode(result.stdout).decode("utf-8")

def handle_tool_call(tool_name, tool_input):
    if tool_name == "computer":
        action = tool_input["action"]
        if action == "screenshot":
            return take_screenshot()
        elif action == "left_click":
            x, y = tool_input["coordinate"]
            subprocess.run(["xdotool", "mousemove", str(x), str(y), "click", "1"])
            return take_screenshot()
        elif action == "type":
            subprocess.run(["xdotool", "type", "--", tool_input["text"]])
            return take_screenshot()
    return ""

messages = [{"role": "user", "content": "Your task here..."}]

while True:
    response = client.beta.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=4096,
        tools=tools,
        messages=messages,
        betas=["computer-use-2024-10-22"]
    )

    if response.stop_reason == "end_turn":
        break

    tool_results = []
    for block in response.content:
        if block.type == "tool_use":
            result = handle_tool_call(block.name, block.input)
            tool_results.append({
                "type": "tool_result",
                "tool_use_id": block.id,
                "content": [{"type": "image", "source": {"type": "base64", "media_type": "image/png", "data": result}}]
            })

    messages.append({"role": "assistant", "content": response.content})
    messages.append({"role": "user", "content": tool_results})

Available computer actions

ActionParametersEffect
screenshotnoneReturns current screen as base64 PNG
left_clickcoordinate [x, y]Mouse click at pixel coordinates
right_clickcoordinate [x, y]Right-click (context menu)
double_clickcoordinate [x, y]Double-click
typetext (string)Type characters at current focus
keytext (e.g. "ctrl+c")Keyboard shortcut
scrollcoordinate, direction, amountMouse wheel scroll
mouse_movecoordinate [x, y]Move cursor without clicking
left_click_dragstart_coordinate, coordinateClick-and-drag
cursor_positionnoneReturns current [x, y] cursor position

Which model to use

Computer use works with Claude Sonnet 4.6 and Claude Opus 4.7. Sonnet 4.6 is the recommended default — it's significantly faster and 5× cheaper, which matters in agentic loops that can take 20–50 API calls per task. Use Opus only for tasks requiring complex multi-step reasoning over ambiguous UIs. Estimate cost with the Claude Cost Calculator.

Safety considerations

Cost per task

A typical 20-step computer use task (screenshot every step) uses roughly 50k input tokens (mostly image data) and 1k output tokens. At Sonnet 4.6 rates: ~$0.165 per task. Complex 50-step tasks: ~$0.40. Estimate your workload with the Cost Calculator.

Frequently asked questions

What is Claude computer use?
Claude computer use is an Anthropic API beta feature that lets Claude operate a computer by viewing screenshots and emitting actions (clicks, keystrokes, scrolls). Your code executes each action and returns the resulting screenshot in a loop until the task is complete.
Does Claude computer use work on Windows?
Yes — the API itself is platform-agnostic. You capture screenshots and execute actions using platform-appropriate tools (Win32 API, PyAutoGUI, or xdotool on Linux). The built-in bash tool works natively on Linux/macOS; on Windows you'd use PowerShell or WSL.
How much does Claude computer use cost per task?
A 20-step task with Claude Sonnet 4.6 costs roughly $0.10–$0.20. Screenshot images are the dominant cost (each PNG is ~2k–5k tokens). You can reduce cost by downscaling screenshots before sending, skipping screenshots on non-visual actions, and using Sonnet instead of Opus.
Is Claude computer use production-ready in 2026?
It's out of research preview but still beta. It works well for structured automation tasks (filling forms, navigating known UIs, running terminal commands) but can struggle on novel or highly dynamic interfaces. Most teams use it for internal tooling rather than customer-facing automation.
Can I use Claude computer use in a headless environment?
Yes — run a virtual framebuffer (Xvfb on Linux) to get a display without a physical monitor. This is the standard approach for CI/CD and Docker-based computer use pipelines.

Free tools

Cost Calculator → Prompt-Pricing Recommender → Diff Summarizer → Skills Browser →

Related

Claude Opus 4.7 vs Sonnet 4.6 Pricing (2026 Comparison)How Much Does Claude Cost? (2026 API Pricing Guide)Claude Prompt Caching: 90% Cost Savings Explained (2026)Claude API Cost Calculator: Estimate Your Anthropic BillClaude vs GPT-4 Pricing: 2026 API Cost Comparison