Claude Vision & Multimodal API Guide

Send images to Claude using the Anthropic API. Python and Node.js examples for base64 images, URLs, PDFs, and multi-image analysis. Works with Sonnet 4.6 and Haiku 4.5.

Claude's API supports multimodal inputs: you can pass images, PDFs, and text together in a single API call. Sonnet 4.6 and Haiku 4.5 both support vision. Opus 4.7 supports vision as well but is slower — for image-heavy workloads, Sonnet 4.6 is the sweet spot.

Python: analyze a local image (base64)

import anthropic, base64, pathlib

client = anthropic.Anthropic()

image_data = base64.standard_b64encode(pathlib.Path("diagram.png").read_bytes()).decode("utf-8")

message = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "image",
                    "source": {
                        "type": "base64",
                        "media_type": "image/png",
                        "data": image_data,
                    },
                },
                {"type": "text", "text": "Describe this diagram and list all labels."}
            ],
        }
    ],
)
print(message.content[0].text)

Python: image from URL

message = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=512,
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "image",
                    "source": {
                        "type": "url",
                        "url": "https://example.com/chart.png",
                    },
                },
                {"type": "text", "text": "What trend does this chart show?"}
            ],
        }
    ],
)
print(message.content[0].text)

Python: multiple images in one call

def encode(path):
    return base64.standard_b64encode(pathlib.Path(path).read_bytes()).decode("utf-8")

message = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "image", "source": {"type": "base64", "media_type": "image/png", "data": encode("before.png")}},
                {"type": "image", "source": {"type": "base64", "media_type": "image/png", "data": encode("after.png")}},
                {"type": "text", "text": "What changed between image 1 (before) and image 2 (after)?"}
            ],
        }
    ],
)
print(message.content[0].text)

Supported media types

image/jpeg, image/png, image/gif, image/webp
Max image size: 5 MB per image, up to 20 images per request
PDF documents: pass as type: "document" with media_type: "application/pdf"

Node.js: analyze image

import Anthropic from "@anthropic-ai/sdk";
import { readFileSync } from "fs";

const client = new Anthropic();
const imageData = readFileSync("screenshot.png").toString("base64");

const response = await client.messages.create({
  model: "claude-sonnet-4-6",
  max_tokens: 512,
  messages: [
    {
      role: "user",
      content: [
        {
          type: "image",
          source: { type: "base64", media_type: "image/png", data: imageData }
        },
        { type: "text", text: "Extract all text visible in this screenshot." }
      ]
    }
  ]
});
console.log(response.content[0].text);

Token cost for images

Images are converted to tokens internally. A 1080×1080 image costs roughly 1,600–2,200 tokens depending on detail level. At Sonnet 4.6 rates ($3/M input tokens) that's ~$0.005 per image — negligible for batch pipelines.

Use the Claude Cost Calculator to model monthly costs for image-heavy workloads. The Prompt-Pricing Recommender can help you choose Haiku vs Sonnet for vision tasks.

Frequently asked questions

Does Claude support image URLs or only base64?

Both. Use type: 'url' with a publicly accessible image URL, or type: 'base64' with a base64-encoded string. URL sources are simpler for public images; base64 is required for private/local files.

What is the maximum image size for the Claude API?

5 MB per image. You can pass up to 20 images in a single API call. Images are resized internally if needed, but keeping them under 1 MB reduces token usage.

Can Claude read text from images (OCR)?

Yes. Claude performs well at extracting printed and handwritten text from images. It does not use a separate OCR pipeline — the vision model handles it directly. For structured documents, combining vision with a JSON output schema improves accuracy.

Which Claude models support vision?

Claude Sonnet 4.6, Claude Haiku 4.5, and Claude Opus 4.7 all support multimodal inputs. Haiku 4.5 is the fastest and cheapest option for high-volume image tasks where Opus-level reasoning is not required.

Free tools

Cost Calculator → Prompt-Pricing Recommender → Diff Summarizer → Skills Browser →

Claude Opus 4.7 vs Sonnet 4.6 Pricing (2026 Comparison)How Much Does Claude Cost? (2026 API Pricing Guide)Claude Prompt Caching: 90% Cost Savings Explained (2026)Claude API Cost Calculator: Estimate Your Anthropic Bill Claude vs GPT-4 Pricing: 2026 API Cost Comparison