Building AI Agents with the Claude SDK: A Practical Guide for Developers
Most tutorials about building AI agents focus on the happy path. The agent calls a tool, gets a result, continues. Clean. Simple. Nothing like what you actually deal with in production.
This guide is different. I've been building Claude-powered agents for my ecommerce operation for eight months. Some of them run dozens of times a day. Here's what actually works - including the parts that are messy.
--
What "Agent" Actually Means Here
Before we touch any code, let's align on terminology because this word is overloaded.
An agent, in the context of the Claude SDK, is a loop:
- Give Claude a task and tools
- Claude decides whether to use a tool
- If yes: execute the tool, feed the result back to Claude
- Repeat until Claude says it's done
That's it. The magic is in how you design the tools, structure the context, and handle the failure cases.
--
The Minimal Agent
Here's the smallest useful agent I can show you:
import anthropic
client = anthropic.Anthropic()
tools = [
{
"name": "get_product_inventory",
"description": "Get current inventory count for a product SKU",
"input_schema": {
"type": "object",
"properties": {
"sku": {
"type": "string",
"description": "The product SKU to check"
}
},
"required": ["sku"]
}
}
]
def run_agent(task: str):
messages = [{"role": "user", "content": task}]
while True:
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
tools=tools,
messages=messages
)
# Agent is done
if response.stop_reason == "end_turn":
return response.content[0].text
# Agent wants to use a tool
if response.stop_reason == "tool_use":
tool_use = next(b for b in response.content if b.type == "tool_use")
# Execute the tool
result = execute_tool(tool_use.name, tool_use.input)
# Add the exchange to message history
messages.append({"role": "assistant", "content": response.content})
messages.append({
"role": "user",
"content": [{
"type": "tool_result",
"tool_use_id": tool_use.id,
"content": str(result)
}]
})
This is the core loop that every Claude agent is built on. The rest is complexity management.
The loop runs until
stop_reason == "end_turn". Everything else is about what happens inside the loop.
--
Designing Tools That Actually Work
The quality of your agent is almost entirely determined by your tool design. Bad tools make even great models perform poorly.
Rule 1: One tool, one responsibility.
I've seen developers build tools like manage_inventory that handles checking, updating, and reporting inventory. This confuses the model and produces unpredictable behavior.
Instead: get_inventory, update_inventory, generate_inventory_report. Three tools with crystal-clear purposes.
Rule 2: Descriptions are prompts.
Your tool description is not documentation. It's instruction. Write it like you're telling a smart colleague exactly when and how to use this function.
Bad:
"description": "Gets order data"
Good:
"description": "Retrieves detailed order information including line items, customer data, shipping status, and fulfillment history. Use this when you need to analyze a specific order or when a customer asks about their order status. Requires a valid order ID."
Rule 3: Return structured data, not prose.
Your tool results feed back into the model's context. Structured data (JSON) is more reliably understood than natural language summaries.
# Bad tool return
return f"There are 47 units of SKU-123 in stock, last updated Tuesday"
# Good tool return
return {
"sku": "SKU-123",
"quantity": 47,
"last_updated": "2026-04-22T14:30:00Z",
"warehouse": "main"
}
--
Prompt Caching: The Performance Multiplier
If your agent runs repeatedly with similar system prompts (and it will), prompt caching will cut your costs significantly and improve response times.
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
system=[
{
"type": "text",
"text": """You are an inventory management agent for an ecommerce store.
Your responsibilities:
- Check inventory levels when asked
- Flag items that need reordering (below 10 units)
- Generate reorder recommendations with quantities
- Track inventory changes over time
Always verify data before making recommendations. Be conservative with reorder quantities.""",
"cache_control": {"type": "ephemeral"}
}
],
tools=tools,
messages=messages
)
60% of my agent API costs disappeared after adding prompt caching. The system prompt gets cached after the first call and reused across the entire conversation.
The cache_control: ephemeral tells Anthropic to cache this content. The cache persists for 5 minutes, which covers most agent loops. For longer operations, you can cache at multiple breakpoints in the conversation.
--
Handling Failure Gracefully
Production agents fail. Here's how to handle it without your entire workflow breaking.
Tool execution errors:
def execute_tool(name: str, inputs: dict) -> dict:
try:
if name == "get_product_inventory":
return get_inventory(inputs["sku"])
# ... other tools
except Exception as e:
# Return error as structured data so Claude can decide what to do
return {
"error": True,
"error_type": type(e).__name__,
"message": str(e),
"recoverable": isinstance(e, (TimeoutError, ConnectionError))
}
When you return structured error data instead of raising an exception, Claude can often recover - retrying the operation, trying an alternative approach, or explaining to the user what happened.
Infinite loop protection:
def run_agent(task: str, max_iterations: int = 10):
messages = [{"role": "user", "content": task}]
iterations = 0
while iterations < max_iterations:
iterations += 1
response = client.messages.create(...)
if response.stop_reason == "end_turn":
return response.content[0].text
# ... handle tool use
return "Agent reached maximum iterations without completing the task."
Set max_iterations based on your task complexity. Simple lookups: 5. Complex multi-step operations: 15-20.
--
Multi-Agent Patterns
Single agents are powerful. Multiple agents working together can handle complexity that would overwhelm any single context window.
The pattern I use most: orchestrator + specialists.
# Orchestrator decides what needs to happen
orchestrator_result = run_agent(
task="Analyze our inventory situation and create a reorder plan",
tools=[route_to_inventory_agent, route_to_pricing_agent, route_to_supplier_agent]
)
# Specialists handle specific domains
def route_to_inventory_agent(query: str) -> dict:
return run_specialized_agent(
system="You are an inventory specialist...",
tools=[get_inventory, update_inventory, get_sales_velocity],
task=query
)
The orchestrator never touches raw data. It coordinates specialists who do. This keeps each agent's context focused and its tool set manageable.
At the end of the day, a single agent with 30 tools is harder to debug and less reliable than three agents with 10 tools each.
--
Streaming for Long Operations
For operations that take more than a few seconds, streaming makes the experience dramatically better.
with client.messages.stream(
model="claude-sonnet-4-6",
max_tokens=2048,
tools=tools,
messages=messages
) as stream:
for event in stream:
if hasattr(event, 'delta') and hasattr(event.delta, 'text'):
print(event.delta.text, end="", flush=True)
This is particularly valuable for agents that generate reports or analysis as their final output. Users see progress instead of waiting for a spinner.
--
The Observability Problem
The hardest part of running agents in production isn't building them. It's understanding what they did when something goes wrong.
My solution: log every tool call and result.
import json
from datetime import datetime
def execute_tool_with_logging(name: str, inputs: dict) -> dict:
start_time = datetime.now()
result = execute_tool(name, inputs)
duration_ms = (datetime.now() - start_time).total_seconds() * 1000
log_entry = {
"timestamp": start_time.isoformat(),
"tool": name,
"inputs": inputs,
"result": result,
"duration_ms": duration_ms
}
# Write to your logging system
append_to_agent_log(log_entry)
return result
This log lets you reconstruct exactly what an agent did, in what order, with what data. When a bug appears (and it will), you won't be debugging blind.
--
Starting Simple, Scaling Up
Here's the progression I'd recommend:
Week 1: Build one agent with two or three tools for a task you currently do manually. Don't optimize. Just get it working.
Week 2: Add error handling and logging. Run it in production but monitor it closely.
Week 3: Add prompt caching. Measure the cost and latency improvement.
Month 2: Extract specialists for different domains. Build the orchestrator pattern.
The agents I run today took about six months to reach their current form. They didn't start that way. They started with three tools and grew as I understood what they needed to do.
--
Resources
The tools I've built for managing Claude agents are available at mynextools.com - including workflow templates and a monitoring dashboard for tracking agent runs.
The full Anthropic SDK documentation is thorough and worth reading: the tool use guide in particular covers edge cases I didn't have space for here.
What are you trying to automate with Claude agents? Drop it in the comments - I read every one and try to cover the most common use cases in future posts.
If you found this useful, follow me here. I publish a new deep-dive every week.