Claude Computer Use API (2026 Guide — Screenshots, Clicks & Automation)

Complete guide to Anthropic's Claude computer use API in 2026. Learn how to pass screenshots, handle tool calls, and build desktop/web automation pipelines with Claude.

Claude computer use lets the model observe a desktop screenshot and emit tool calls that your code executes — click, type, scroll, screenshot — in a loop until the task is done. It ships as three built-in tools in the Messages API: computer, text_editor, and bash.

How it works

Enabling the tools

Pass computer-use-2024-10-22 as a beta header and include the built-in tools in your request:

The agentic loop

Available computer actions

Which model to use

Action	Parameters	Effect
screenshot	none	Returns current screen as base64 PNG
left_click	coordinate [x, y]	Mouse click at pixel coordinates
right_click	coordinate [x, y]	Right-click (context menu)
double_click	coordinate [x, y]	Double-click
type	text (string)	Type characters at current focus
key	text (e.g. "ctrl+c")	Keyboard shortcut
scroll	coordinate, direction, amount	Mouse wheel scroll
mouse_move	coordinate [x, y]	Move cursor without clicking
left_click_drag	start_coordinate, coordinate	Click-and-drag
cursor_position	none	Returns current [x, y] cursor position

Computer use works with Claude Sonnet 4.6 and Claude Opus 4.7. Sonnet 4.6 is the recommended default — it's significantly faster and 5× cheaper, which matters in agentic loops that can take 20–50 API calls per task. Use Opus only for tasks requiring complex multi-step reasoning over ambiguous UIs. Estimate cost with the Claude Cost Calculator.

Safety considerations

Cost per task

A typical 20-step computer use task (screenshot every step) uses roughly 50k input tokens (mostly image data) and 1k output tokens. At Sonnet 4.6 rates: ~$0.165 per task. Complex 50-step tasks: ~$0.40. Estimate your workload with the Cost Calculator.

Frequently asked questions

What is Claude computer use?

Claude computer use is an Anthropic API beta feature that lets Claude operate a computer by viewing screenshots and emitting actions (clicks, keystrokes, scrolls). Your code executes each action and returns the resulting screenshot in a loop until the task is complete.

Does Claude computer use work on Windows?

Yes — the API itself is platform-agnostic. You capture screenshots and execute actions using platform-appropriate tools (Win32 API, PyAutoGUI, or xdotool on Linux). The built-in bash tool works natively on Linux/macOS; on Windows you'd use PowerShell or WSL.

How much does Claude computer use cost per task?

A 20-step task with Claude Sonnet 4.6 costs roughly $0.10–$0.20. Screenshot images are the dominant cost (each PNG is ~2k–5k tokens). You can reduce cost by downscaling screenshots before sending, skipping screenshots on non-visual actions, and using Sonnet instead of Opus.

Is Claude computer use production-ready in 2026?

It's out of research preview but still beta. It works well for structured automation tasks (filling forms, navigating known UIs, running terminal commands) but can struggle on novel or highly dynamic interfaces. Most teams use it for internal tooling rather than customer-facing automation.

Can I use Claude computer use in a headless environment?

Yes — run a virtual framebuffer (Xvfb on Linux) to get a display without a physical monitor. This is the standard approach for CI/CD and Docker-based computer use pipelines.

Free tools

Claude Computer Use API: Automate Desktop and Web Tasks