Building AI Agents with the Claude SDK: A Practical Guide for Developers

Most tutorials about building AI agents focus on the happy path. The agent calls a tool, gets a result, continues. Clean. Simple. Nothing like what you actually deal with in production.

This guide is different. I've been building Claude-powered agents for my ecommerce operation for eight months. Some of them run dozens of times a day. Here's what actually works - including the parts that are messy.

What "Agent" Actually Means Here

Before we touch any code, let's align on terminology because this word is overloaded.

An agent, in the context of the Claude SDK, is a loop:

Give Claude a task and tools
Claude decides whether to use a tool
If yes: execute the tool, feed the result back to Claude
Repeat until Claude says it's done

That's it. The magic is in how you design the tools, structure the context, and handle the failure cases.

The Minimal Agent

Here's the smallest useful agent I can show you:

import anthropic

client = anthropic.Anthropic()

tools = [
    {
        "name": "get_product_inventory",
        "description": "Get current inventory count for a product SKU",
        "input_schema": {
            "type": "object",
            "properties": {
                "sku": {
                    "type": "string",
                    "description": "The product SKU to check"
                }
            },
            "required": ["sku"]
        }
    }
]

def run_agent(task: str):
    messages = [{"role": "user", "content": task}]

    while True:
        response = client.messages.create(
            model="claude-sonnet-4-6",
            max_tokens=1024,
            tools=tools,
            messages=messages
        )

        # Agent is done
        if response.stop_reason == "end_turn":
            return response.content[0].text

        # Agent wants to use a tool
        if response.stop_reason == "tool_use":
            tool_use = next(b for b in response.content if b.type == "tool_use")

            # Execute the tool
            result = execute_tool(tool_use.name, tool_use.input)

            # Add the exchange to message history
            messages.append({"role": "assistant", "content": response.content})
            messages.append({
                "role": "user",
                "content": [{
                    "type": "tool_result",
                    "tool_use_id": tool_use.id,
                    "content": str(result)
                }]
            })

This is the core loop that every Claude agent is built on. The rest is complexity management.

The loop runs until stop_reason == "end_turn". Everything else is about what happens inside the loop.

Designing Tools That Actually Work

The quality of your agent is almost entirely determined by your tool design. Bad tools make even great models perform poorly.

Rule 1: One tool, one responsibility.

I've seen developers build tools like manage_inventory that handles checking, updating, and reporting inventory. This confuses the model and produces unpredictable behavior.

Instead: get_inventory, update_inventory, generate_inventory_report. Three tools with crystal-clear purposes.

Rule 2: Descriptions are prompts.

Your tool description is not documentation. It's instruction. Write it like you're telling a smart colleague exactly when and how to use this function.

Bad:

"description": "Gets order data"

Good:

"description": "Retrieves detailed order information including line items, customer data, shipping status, and fulfillment history. Use this when you need to analyze a specific order or when a customer asks about their order status. Requires a valid order ID."

Rule 3: Return structured data, not prose.

Your tool results feed back into the model's context. Structured data (JSON) is more reliably understood than natural language summaries.

# Bad tool return
return f"There are 47 units of SKU-123 in stock, last updated Tuesday"

# Good tool return  
return {
    "sku": "SKU-123",
    "quantity": 47,
    "last_updated": "2026-04-22T14:30:00Z",
    "warehouse": "main"
}

Prompt Caching: The Performance Multiplier

If your agent runs repeatedly with similar system prompts (and it will), prompt caching will cut your costs significantly and improve response times.

response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    system=[
        {
            "type": "text",
            "text": """You are an inventory management agent for an ecommerce store.

Your responsibilities:
- Check inventory levels when asked
- Flag items that need reordering (below 10 units)
- Generate reorder recommendations with quantities
- Track inventory changes over time

Always verify data before making recommendations. Be conservative with reorder quantities.""",
            "cache_control": {"type": "ephemeral"}
        }
    ],
    tools=tools,
    messages=messages
)

60% of my agent API costs disappeared after adding prompt caching. The system prompt gets cached after the first call and reused across the entire conversation.

The cache_control: ephemeral tells Anthropic to cache this content. The cache persists for 5 minutes, which covers most agent loops. For longer operations, you can cache at multiple breakpoints in the conversation.

Handling Failure Gracefully

Production agents fail. Here's how to handle it without your entire workflow breaking.

Tool execution errors:

def execute_tool(name: str, inputs: dict) -> dict:
    try:
        if name == "get_product_inventory":
            return get_inventory(inputs["sku"])
        # ... other tools
    except Exception as e:
        # Return error as structured data so Claude can decide what to do
        return {
            "error": True,
            "error_type": type(e).__name__,
            "message": str(e),
            "recoverable": isinstance(e, (TimeoutError, ConnectionError))
        }

When you return structured error data instead of raising an exception, Claude can often recover - retrying the operation, trying an alternative approach, or explaining to the user what happened.

Infinite loop protection:

def run_agent(task: str, max_iterations: int = 10):
    messages = [{"role": "user", "content": task}]
    iterations = 0

    while iterations < max_iterations:
        iterations += 1
        response = client.messages.create(...)

        if response.stop_reason == "end_turn":
            return response.content[0].text

        # ... handle tool use

    return "Agent reached maximum iterations without completing the task."

Set max_iterations based on your task complexity. Simple lookups: 5. Complex multi-step operations: 15-20.

Multi-Agent Patterns

Single agents are powerful. Multiple agents working together can handle complexity that would overwhelm any single context window.

The pattern I use most: orchestrator + specialists.

# Orchestrator decides what needs to happen
orchestrator_result = run_agent(
    task="Analyze our inventory situation and create a reorder plan",
    tools=[route_to_inventory_agent, route_to_pricing_agent, route_to_supplier_agent]
)

# Specialists handle specific domains
def route_to_inventory_agent(query: str) -> dict:
    return run_specialized_agent(
        system="You are an inventory specialist...",
        tools=[get_inventory, update_inventory, get_sales_velocity],
        task=query
    )

The orchestrator never touches raw data. It coordinates specialists who do. This keeps each agent's context focused and its tool set manageable.

At the end of the day, a single agent with 30 tools is harder to debug and less reliable than three agents with 10 tools each.

Streaming for Long Operations

For operations that take more than a few seconds, streaming makes the experience dramatically better.

with client.messages.stream(
    model="claude-sonnet-4-6",
    max_tokens=2048,
    tools=tools,
    messages=messages
) as stream:
    for event in stream:
        if hasattr(event, 'delta') and hasattr(event.delta, 'text'):
            print(event.delta.text, end="", flush=True)

This is particularly valuable for agents that generate reports or analysis as their final output. Users see progress instead of waiting for a spinner.

The Observability Problem

The hardest part of running agents in production isn't building them. It's understanding what they did when something goes wrong.

My solution: log every tool call and result.

import json
from datetime import datetime

def execute_tool_with_logging(name: str, inputs: dict) -> dict:
    start_time = datetime.now()
    result = execute_tool(name, inputs)
    duration_ms = (datetime.now() - start_time).total_seconds() * 1000

    log_entry = {
        "timestamp": start_time.isoformat(),
        "tool": name,
        "inputs": inputs,
        "result": result,
        "duration_ms": duration_ms
    }

    # Write to your logging system
    append_to_agent_log(log_entry)

    return result

This log lets you reconstruct exactly what an agent did, in what order, with what data. When a bug appears (and it will), you won't be debugging blind.

Starting Simple, Scaling Up

Here's the progression I'd recommend:

Week 1: Build one agent with two or three tools for a task you currently do manually. Don't optimize. Just get it working.

Week 2: Add error handling and logging. Run it in production but monitor it closely.

Week 3: Add prompt caching. Measure the cost and latency improvement.

Month 2: Extract specialists for different domains. Build the orchestrator pattern.

The agents I run today took about six months to reach their current form. They didn't start that way. They started with three tools and grew as I understood what they needed to do.

Resources

The tools I've built for managing Claude agents are available at mynextools.com - including workflow templates and a monitoring dashboard for tracking agent runs.

The full Anthropic SDK documentation is thorough and worth reading: the tool use guide in particular covers edge cases I didn't have space for here.

What are you trying to automate with Claude agents? Drop it in the comments - I read every one and try to cover the most common use cases in future posts.

If you found this useful, follow me here. I publish a new deep-dive every week.

Building AI Agents with the Claude SDK: A Practical Guide for Developers

What "Agent" Actually Means Here

The Minimal Agent

Designing Tools That Actually Work

Prompt Caching: The Performance Multiplier

Handling Failure Gracefully

Multi-Agent Patterns

Streaming for Long Operations

The Observability Problem

Starting Simple, Scaling Up

Resources

Comments

More from this blog

Claude Code for TypeScript Migrations: How I Converted a 200,000-Line JavaScript Codebase Without Stopping Shipping

Claude Code for Dependency Management: How I Stopped Being Afraid of npm Update

Claude Code for Incident Response: How I Cut My Mean Time to Recovery in Half

Claude Code for Security Audits: How I Catch Vulnerabilities Before They Cost Me

Claude Code for Documentation Generation: How I Stopped Shipping Code Nobody Could Read

Command Palette

What "Agent" Actually Means Here

The Minimal Agent

Designing Tools That Actually Work

Prompt Caching: The Performance Multiplier

Handling Failure Gracefully

Multi-Agent Patterns

Streaming for Long Operations

The Observability Problem

Starting Simple, Scaling Up

Resources

Comments

More from this blog