Calling the Claude API from Ruby in 2026 — the official anthropic gem, streaming, prompt caching, tool use, and how to integrate with Rails background jobs.
Anthropic ships an official Ruby SDK — anthropic on rubygems.org, source at github.com/anthropics/anthropic-sdk-ruby. It supports Ruby 3.0+. Feature parity with the Python and TypeScript SDKs: messages, streaming, batch, prompt caching, tool use, vision.
# Gemfile
gem "anthropic"
# or directly
gem install anthropic
require "anthropic"
client = Anthropic::Client.new(api_key: ENV["ANTHROPIC_API_KEY"])
message = client.messages.create(
model: "claude-sonnet-4-6",
max_tokens: 1024,
messages: [{ role: "user", content: "Explain Ruby blocks in 3 bullets." }]
)
puts message.content.first.text
client.messages.stream(
model: "claude-sonnet-4-6",
max_tokens: 1024,
messages: [{ role: "user", content: "Write a limerick." }]
) do |event|
if event.type == "content_block_delta" && event.delta.type == "text_delta"
print event.delta.text
end
end
client.messages.create(
model: "claude-sonnet-4-6",
max_tokens: 1024,
system: [
{
type: "text",
text: long_system_prompt, # 20k+ tokens
cache_control: { type: "ephemeral" }
}
],
messages: [{ role: "user", content: question }]
)
Cached reads cost 10% of input price after the first request. Inspect the usage field on responses for cache_creation_input_tokens and cache_read_input_tokens. Background: prompt caching explained.
class ClaudeCompletionJob
include Sidekiq::Job
sidekiq_options retry: 5, queue: "claude"
def perform(prompt_id)
prompt = Prompt.find(prompt_id)
client = Anthropic::Client.new
response = client.messages.create(
model: "claude-sonnet-4-6",
max_tokens: 2048,
messages: [{ role: "user", content: prompt.text }]
)
prompt.update!(
completion: response.content.first.text,
input_tokens: response.usage.input_tokens,
output_tokens: response.usage.output_tokens
)
rescue Anthropic::RateLimitError => e
retry_in = e.response.headers["retry-after"].to_i
raise Sidekiq::RetryableError.new(e.message, retry_in)
end
end
Sidekiq's retry-with-backoff plus the SDK's built-in retries (configurable with Anthropic::Client.new(max_retries: 5)) covers most 429s without custom logic.
tools = [{
name: "get_weather",
description: "Get current weather for a location.",
input_schema: {
type: "object",
properties: { location: { type: "string" } },
required: ["location"]
}
}]
response = client.messages.create(
model: "claude-sonnet-4-6",
max_tokens: 1024,
tools: tools,
messages: [{ role: "user", content: "What's the weather in Tokyo?" }]
)
response.content.each do |block|
if block.type == "tool_use"
result = WeatherService.fetch(block.input["location"])
# feed result back on the next turn
end
end
config/initializers/anthropic.rb and reuse it. The client is thread-safe.Persist response.usage per call to a claude_usage_events table and roll up daily. The Claude Cost Calculator models per-MTok rates so you can forecast monthly bills from your usage logs.
Alternative SDK paths: Python SDK, TypeScript SDK, Go SDK.