On this page
Lesson 23 of 30

Lesson 22: Batch API for Bulk Processing

When you need to process hundreds or thousands of items and don't need instant results, the Batch API is your best friend. It runs requests asynchronously at a 50% cost reduction compared to real-time API calls — and it stacks with prompt caching for even greater savings.


What Is the Batch API?

The Batch API lets you submit a collection of message requests as a single batch job. Anthropic processes them in the background within a 24-hour window. You don't get streaming or real-time responses — instead, you poll for completion and then download all results at once.

Think of it like a print queue: you submit the jobs, walk away, and come back when they're done.

50% off, automatically. Every request in a batch costs half what the same request would cost through the real-time Messages API. No special pricing tier required.


When to Use Batch API

The Batch API is ideal when:

  • Processing large datasets — classifying thousands of documents, extracting data from hundreds of records
  • Running overnight jobs — analysis pipelines that don't need immediate results
  • Bulk content generation — translating articles, generating summaries, creating descriptions
  • Cost-sensitive workloads — any task where you're willing to trade latency for savings

It is not suitable for:

  • Real-time chat or interactive applications
  • Anything requiring streaming responses
  • Tasks where you need results in under a few minutes

Batch Request Format

Batch requests use JSONL format (one JSON object per line). Each line is a standalone message request with a custom ID you assign.

Jsonl
{"custom_id": "doc-001", "params": {"model": "claude-sonnet-4-20250514", "max_tokens": 1024, "messages": [{"role": "user", "content": "Summarize: The quick brown fox..."}]}}
{"custom_id": "doc-002", "params": {"model": "claude-sonnet-4-20250514", "max_tokens": 1024, "messages": [{"role": "user", "content": "Summarize: In a galaxy far away..."}]}}
{"custom_id": "doc-003", "params": {"model": "claude-sonnet-4-20250514", "max_tokens": 1024, "messages": [{"role": "user", "content": "Summarize: Once upon a time..."}]}}

The custom_id field is how you match results back to your original requests. Use meaningful IDs (document names, database row IDs, etc.).


Submitting a Batch

Here's a complete workflow: create the batch requests, submit them, poll for completion, and retrieve results.

Python
import anthropic
import json
import time

client = anthropic.Anthropic()

# Step 1: Prepare your requests
documents = [
    {"id": "doc-001", "text": "First document content here..."},
    {"id": "doc-002", "text": "Second document content here..."},
    {"id": "doc-003", "text": "Third document content here..."},
]

requests = []
for doc in documents:
    requests.append({
        "custom_id": doc["id"],
        "params": {
            "model": "claude-sonnet-4-20250514",  # Check docs.anthropic.com for latest model IDs
            "max_tokens": 512,
            "messages": [
                {
                    "role": "user",
                    "content": f"Summarize this document in 2-3 sentences:\n\n{doc['text']}",
                }
            ],
        },
    })

# Step 2: Submit the batch
batch = client.batches.create(requests=requests)
print(f"Batch submitted: {batch.id}")
print(f"Status: {batch.processing_status}")

# Step 3: Poll for completion
while batch.processing_status != "ended":
    time.sleep(30)  # Check every 30 seconds
    batch = client.batches.retrieve(batch.id)
    succeeded = batch.request_counts.succeeded
    total = batch.request_counts.processing + succeeded + batch.request_counts.errored
    print(f"Progress: {succeeded}/{total} completed")

print(f"Batch complete! {batch.request_counts.succeeded} succeeded, "
      f"{batch.request_counts.errored} errored")

# Step 4: Retrieve results
results = {}
for result in client.batches.results(batch.id):
    custom_id = result.custom_id
    if result.result.type == "succeeded":
        text = result.result.message.content[0].text
        results[custom_id] = text
    else:
        results[custom_id] = f"ERROR: {result.result.error}"

# Print results
for doc_id, summary in results.items():
    print(f"\n--- {doc_id} ---")
    print(summary)

Real-World Use Cases

Document Classification

Python
# Classify thousands of support tickets by category
requests = []
for ticket in support_tickets:
    requests.append({
        "custom_id": ticket["id"],
        "params": {
            "model": "claude-haiku-4-20250514",  # Check docs.anthropic.com for latest model IDs
            "max_tokens": 50,
            "messages": [{
                "role": "user",
                "content": f"Classify this support ticket into exactly one category "
                           f"(billing, technical, account, other): {ticket['text']}"
            }],
        },
    })

batch = client.batches.create(requests=requests)

Data Extraction

Python
# Extract structured data from unstructured text
requests = []
for record in raw_records:
    requests.append({
        "custom_id": record["id"],
        "params": {
            "model": "claude-sonnet-4-20250514",  # Check docs.anthropic.com for latest model IDs
            "max_tokens": 256,
            "tools": [extract_fields_tool],  # Use tool_use for reliable structure
            "messages": [{
                "role": "user",
                "content": f"Extract the name, date, and amount from: {record['text']}"
            }],
        },
    })

Combining Batch API with Prompt Caching

For maximum savings, use prompt caching alongside the Batch API. If all your batch requests share a long system prompt or reference document, cache that shared content.

Python
# All requests share this large context — mark it for caching
shared_system = {
    "type": "text",
    "text": "You are a legal document analyzer. Here is the full regulatory framework: ...",
    "cache_control": {"type": "ephemeral"},
}

requests = []
for doc in documents:
    requests.append({
        "custom_id": doc["id"],
        "params": {
            "model": "claude-sonnet-4-20250514",
            "max_tokens": 1024,
            "system": [shared_system],
            "messages": [{"role": "user", "content": f"Analyze: {doc['text']}"}],
        },
    })

The first request in the batch pays the full cache-write cost. Subsequent requests read from cache at ~90% discount. Combined with the 50% batch discount, you can see overall savings of 90-95% compared to individual uncached Opus calls.


Limitations to Know

Limitation Detail
Processing time Up to 24 hours (usually much faster)
No streaming Results are only available after processing completes
No real-time Not suitable for interactive or time-sensitive applications
Request limits Each batch can contain up to 10,000 requests
Expiration Results are available for 29 days after completion

Key Takeaways

  • Batch API gives a flat 50% cost reduction on all requests — no strings attached
  • Submit requests as JSONL with unique custom_id values for result matching
  • Poll for completion, then download all results at once
  • Use Haiku in batches for classification and simple tasks — cheap model + batch discount = extremely low cost
  • Combine with prompt caching for 90%+ savings on shared-context workloads
  • Not for real-time use — accept the 24-hour window in exchange for dramatic cost savings
  • Use tool_use in batch requests to get structured output you can reliably parse