Programmatic tool calling allows Claude to write code that calls your tools programmatically within a code execution container, rather than requiring round trips through the model for each tool invocation. This reduces latency for multi-tool workflows and decreases token consumption by allowing Claude to filter or process data before it reaches the model's context window. On agentic search benchmarks like BrowseComp and DeepSearchQA, which test multi-step web research and complex information retrieval, adding programmatic tool calling on top of basic search tools improved performance by an average of 11% while using 24% fewer input tokens (see Improved web search with dynamic filtering).
Consider checking budget compliance across 20 employees: the traditional approach requires 20 separate model round-trips, pulling thousands of expense line items into the context along the way. With programmatic tool calling, a single script runs all 20 lookups, filters the results, and returns only the employees who exceeded their limits, shrinking what Claude needs to reason over from hundreds of kilobytes down to a handful of lines.
For a deeper look at the inference and context costs that programmatic tool calling addresses, see Advanced tool use.
This feature requires the code execution tool to be enabled.
This feature is not eligible for Zero Data Retention (ZDR). Data is retained according to the feature's standard retention policy.
Programmatic tool calling requires code_execution_20260120 or later, which is supported on the following models:
| Model |
|---|
| Claude Fable 5 (claude-fable-5) |
| Claude Mythos 5 (claude-mythos-5) |
| Claude Opus 4.8 (claude-opus-4-8) |
| Claude Opus 4.7 (claude-opus-4-7) |
| Claude Opus 4.6 (claude-opus-4-6) |
| Claude Sonnet 4.6 (claude-sonnet-4-6) |
| Claude Opus 4.5 (claude-opus-4-5-20251101) |
| Claude Sonnet 4.5 (claude-sonnet-4-5-20250929) |
For the full code execution tool version matrix, see the code execution tool model compatibility table. Programmatic tool calling is available on the Claude API, Claude Platform on AWS, and Microsoft Foundry. It is not currently available on Amazon Bedrock or Google Cloud.
Here's an example where Claude programmatically queries a database multiple times and aggregates results:
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-opus-4-8",
max_tokens=4096,
messages=[
{
"role": "user",
"content": "Query sales data for the West, East, and Central regions, then tell me which region had the highest revenue",
}
],
tools=[
{"type": "code_execution_20260120", "name": "code_execution"},
{
"name": "query_database",
"description": "Execute a SQL query against the sales database. Returns a list of rows as JSON objects.",
"input_schema": {
"type": "object",
"properties": {
"sql": {"type": "string", "description": "SQL query to execute"}
},
"required": ["sql"],
},
"allowed_callers": ["code_execution_20260120"],
},
],
)
print(response)The response stops with stop_reason: "tool_use", a container ID, and a tool_use block for query_database whose caller field identifies the code execution run that called it. Return the result as shown in Step 3 of the example workflow so the code can finish.
When you configure a tool to be callable from code execution and Claude decides to use that tool:
tool_use blockThis approach is particularly useful for:
Tools that allow a code execution caller are exposed to Claude's code as async Python functions, so Claude can run them in parallel with asyncio.gather. Each function takes a single dict of arguments and returns a string: the text of the tool_result you send back. Claude's code awaits these functions with top-level await and parses results that it needs as structured data, for example rows = json.loads(await query_database({"sql": "<sql>"})).
allowed_callers fieldThe allowed_callers field specifies which contexts can invoke a tool:
{
"name": "query_database",
"description": "Execute a SQL query against the database",
"input_schema": {
// ...
},
"allowed_callers": ["code_execution_20260120"]
}Possible values:
["direct"] - Claude is guided to call this tool directly (default if omitted)["code_execution_20260120"] - Claude is guided to call this tool only from within code execution["direct", "code_execution_20260120"] - Claude may call this tool directly or from within code executionBoth "code_execution_20260120" and "code_execution_20260521" are accepted in allowed_callers and are interchangeable: a request using either code-execution tool version satisfies tools that list either caller. Response blocks always tag the caller as code_execution_20260120 regardless of which version the request declared.
Choose either ["direct"] or ["code_execution_20260120"] for each tool rather than enabling both, as this provides clearer guidance to Claude for how best to use the tool.
allowed_callers controls how the tool is presented to Claude and is validated against tool_choice, but it is not a hard API-level block on direct invocation. Claude is strongly guided to respect it, but your client should still be prepared to handle a direct tool_use for any tool it defines. Do not rely on allowed_callers as a security boundary.
caller field in responsesEvery tool use block includes a caller field indicating how it was invoked:
Direct invocation (traditional tool use):
{
"type": "tool_use",
"id": "toolu_abc123",
"name": "query_database",
"input": { "sql": "<sql>" },
"caller": { "type": "direct" }
}Programmatic invocation:
{
"type": "tool_use",
"id": "toolu_xyz789",
"name": "query_database",
"input": { "sql": "<sql>" },
"caller": {
"type": "code_execution_20260120",
"tool_id": "srvtoolu_abc123"
}
}The tool_id is the id of the code execution server_tool_use block that made the call, so you can match each programmatic tool_use to the code execution run that produced it.
Programmatic tool calling uses the same containers as code execution:
container field, along with an expires_at timestampexpires_at tells you how long the container has left. Idle containers are currently reclaimed after about 5 minutes, and no container can be reused more than 30 days after it was created.While Claude's code is waiting for a programmatic tool result, the pending call times out after about 4 minutes and raises a TimeoutError inside the code. Return each tool result well before the expires_at timestamp on the paused response. See Container expiration during tool call.
Here's how a complete programmatic tool calling flow works:
Send a request with code execution and a tool that allows programmatic calling. To enable programmatic calling, add the allowed_callers field to your tool definition.
Provide detailed descriptions of your tool's output format in the tool description. If you specify that the tool returns JSON, Claude attempts to deserialize and process the result in code. The more detail you provide about the output schema, the better Claude can handle the response programmatically.
The request shape is identical to the Quick start example: include code_execution in your tools list, add allowed_callers: ["code_execution_20260120"] to any tool you want Claude to invoke from code, and send your user message. The remaining steps in this workflow use the user message "Query customer purchase history from the last quarter and identify our top 5 customers by revenue".
Claude writes code that calls your tool. The API pauses and returns:
{
"role": "assistant",
"content": [
{
"type": "text",
"text": "I'll query the purchase history and analyze the results."
},
{
"type": "server_tool_use",
"id": "srvtoolu_abc123",
"name": "code_execution",
"input": {
"code": "import json\n\nrows = json.loads(await query_database({'sql': '<sql>'}))\ntop_customers = sorted(rows, key=lambda x: x['revenue'], reverse=True)[:5]\nprint(f'Top 5 customers: {top_customers}')"
}
},
{
"type": "tool_use",
"id": "toolu_def456",
"name": "query_database",
"input": { "sql": "<sql>" },
"caller": {
"type": "code_execution_20260120",
"tool_id": "srvtoolu_abc123"
}
}
],
"container": {
"id": "container_xyz789",
"expires_at": "2026-01-20T14:30:00Z"
},
"stop_reason": "tool_use"
}Send the full conversation history plus your tool result. Three details matter on this request:
tool_result blocks. See Message formatting restrictions.container ID from the paused response. The API rejects a continuation that has pending programmatic tool calls but no container ID.tools array as the original request. The code execution tool must still be present for the paused code to resume, and the tools you send on this request are the definitions Claude and the running code can use for the rest of the turn.response = client.messages.create(
model="claude-opus-4-8",
max_tokens=4096,
container="container_xyz789", # Reuse the container
messages=[
{
"role": "user",
"content": "Query customer purchase history from the last quarter and identify our top 5 customers by revenue",
},
{
"role": "assistant",
"content": [
{
"type": "text",
"text": "I'll query the purchase history and analyze the results.",
},
{
"type": "server_tool_use",
"id": "srvtoolu_abc123",
"name": "code_execution",
"input": {"code": "..."},
},
{
"type": "tool_use",
"id": "toolu_def456",
"name": "query_database",
"input": {"sql": "<sql>"},
"caller": {
"type": "code_execution_20260120",
"tool_id": "srvtoolu_abc123",
},
},
],
},
{
"role": "user",
"content": [
{
"type": "tool_result",
"tool_use_id": "toolu_def456",
"content": '[{"customer_id": "C1", "revenue": 45000}, {"customer_id": "C2", "revenue": 38000}, ...]',
}
],
},
],
# Same tools array as the original request
tools=[
{"type": "code_execution_20260120", "name": "code_execution"},
{
"name": "query_database",
"description": "Execute a SQL query against the sales database. Returns a list of rows as JSON objects.",
"input_schema": {
"type": "object",
"properties": {
"sql": {"type": "string", "description": "SQL query to execute"}
},
"required": ["sql"],
},
"allowed_callers": ["code_execution_20260120"],
},
],
)
print(response)The code picks up where it paused and processes your result. Each continuation response either pauses again with more programmatic tool_use blocks, or completes the code execution and lets Claude continue the turn (Step 5). Check stop_reason and each tool_use block's caller to tell the two apart: a response that pauses for you has stop_reason: "tool_use" and a tool_use block whose caller names a code execution version, and you repeat Step 3 with a tool_result for every pending programmatic call in one user message.
Once the code execution completes, Claude provides the final response:
{
"content": [
{
"type": "code_execution_tool_result",
"tool_use_id": "srvtoolu_abc123",
"content": {
"type": "code_execution_result",
"stdout": "Top 5 customers: [{'customer_id': 'C1', 'revenue': 45000}, {'customer_id': 'C2', 'revenue': 38000}, {'customer_id': 'C5', 'revenue': 32000}, {'customer_id': 'C8', 'revenue': 28500}, {'customer_id': 'C3', 'revenue': 24000}]",
"stderr": "",
"return_code": 0,
"content": []
}
},
{
"type": "text",
"text": "I've analyzed the purchase history from last quarter. Your top 5 customers generated $167,500 in total revenue, with Customer C1 leading at $45,000."
}
],
"stop_reason": "end_turn"
}Claude can write code that processes multiple items efficiently:
regions = ["West", "East", "Central", "North", "South"]
results = {}
for region in regions:
rows = json.loads(await query_database({"sql": f"<sql for {region}>"}))
results[region] = sum(row["revenue"] for row in rows)
# Process results programmatically
top_region = max(results.items(), key=lambda x: x[1])
print(f"Top region: {top_region[0]} with ${top_region[1]:,} in revenue")This pattern:
Claude can stop processing as soon as success criteria are met:
endpoints = ["us-east", "eu-west", "apac"]
for endpoint in endpoints:
status = await check_health({"endpoint": endpoint})
if status == "healthy":
print(f"Found healthy endpoint: {endpoint}")
break # Stop early, don't check remainingpath = "/tmp/example.txt"
file_info = json.loads(await get_file_info({"path": path}))
if file_info["size"] < 10000:
content = await read_full_file({"path": path})
else:
content = await read_file_summary({"path": path})
print(content)server_id = "srv-01"
log_text = await fetch_logs({"server_id": server_id})
errors = [line for line in log_text.splitlines() if "ERROR" in line]
print(f"Found {len(errors)} errors")
for error in errors[-10:]: # Only return last 10 errors
print(error)When code execution calls a tool:
{
"type": "tool_use",
"id": "toolu_abc123",
"name": "query_database",
"input": { "sql": "<sql>" },
"caller": {
"type": "code_execution_20260120",
"tool_id": "srvtoolu_xyz789"
}
}Your tool result is passed back to the running code:
{
"role": "user",
"content": [
{
"type": "tool_result",
"tool_use_id": "toolu_abc123",
"content": "[{\"customer_id\": \"C1\", \"revenue\": 45000, \"orders\": 23}, {\"customer_id\": \"C2\", \"revenue\": 38000, \"orders\": 18}, ...]"
}
]
}When all tool calls are satisfied and code completes:
{
"type": "code_execution_tool_result",
"tool_use_id": "srvtoolu_xyz789",
"content": {
"type": "code_execution_result",
"stdout": "Analysis complete. Top 5 customers identified from 847 total records.",
"stderr": "",
"return_code": 0,
"content": []
}
}| Error | Where it appears | Description | Solution |
|---|---|---|---|
invalid_tool_input | error_code on the code_execution_tool_result error block in the response | Invalid parameters were passed to the code execution tool | See the code execution tool errors |
invalid_request_error (on tool_choice) | HTTP 400 error response | tool_choice names a tool whose allowed_callers does not include "direct" | Either add "direct" to that tool's allowed_callers, or remove the tool from tool_choice and let Claude invoke it from code |
If your tool result doesn't arrive within about 4 minutes, the pending call raises a TimeoutError inside Claude's running code. Claude sees the error in stderr and typically retries the call:
{
"type": "code_execution_tool_result",
"tool_use_id": "srvtoolu_abc123",
"content": {
"type": "code_execution_result",
"stdout": "",
"stderr": "TimeoutError: Calling tool ['query_database'] timed out (no response after 270s).",
"return_code": 0,
"content": []
}
}To prevent timeouts:
expires_at field in responsesIf your tool returns an error:
{
"type": "tool_result",
"tool_use_id": "toolu_abc123",
"content": "Error: Query timeout - table lock exceeded 30 seconds"
}Claude's code receives this error and can handle it appropriately.
strict: true are not supported with programmatic callingtool_choicedisable_parallel_tool_use: true is not supported with programmatic callingThe following tools cannot be called programmatically:
When responding to programmatic tool calls, there are strict formatting requirements:
Tool result only responses: If there are pending programmatic tool calls waiting for results, your response message must contain only tool_result blocks. You cannot include any text content, even after the tool results.
Invalid - Cannot include text when responding to programmatic tool calls:
{
"role": "user",
"content": [
{
"type": "tool_result",
"tool_use_id": "toolu_01",
"content": "[{\"customer_id\": \"C1\", \"revenue\": 45000}]"
},
{ "type": "text", "text": "What should I do next?" }
]
}Valid - Only tool results when responding to programmatic tool calls:
{
"role": "user",
"content": [
{
"type": "tool_result",
"tool_use_id": "toolu_01",
"content": "[{\"customer_id\": \"C1\", \"revenue\": 45000}]"
}
]
}This restriction only applies when responding to programmatic (code execution) tool calls. For regular client-side tool calls, you can include text content after tool results.
Text-only tool result content: The content of each tool_result that answers a programmatic call must be a string or text blocks. Image, document, and other content block types are rejected.
Programmatic tool calls are subject to the same rate limits as regular tool calls. Each tool call from code execution counts as a separate invocation.
When implementing user-defined tools that will be called programmatically:
Programmatic tool calling reduces token consumption in three ways:
For example, calling 10 tools directly uses ~10x the tokens of calling them programmatically and returning a summary.
In Anthropic's internal evaluations on a production Claude model:
tools array contains 10 to 49 tool definitions see typical token savings of 20% to 40% with programmatic tool calling enabled.Actual savings vary with workload shape. See When to use programmatic calling.
Programmatic tool calling uses the same pricing as code execution. See the code execution pricing for details.
Token counting for programmatic tool calls: Tool results from programmatic invocations do not count toward your input/output token usage. Only the final code execution result and Claude's response count.
Programmatic tool calling trades a small fixed overhead (container startup, script generation) for large savings on tool-result tokens and model round-trips. Whether that trade pays off depends on workload shape.
Strong fit:
Weak fit:
If you are unsure, measure billed input tokens with and without allowed_callers on a representative sample of your traffic before enabling it broadly.
invalid_request_error when setting tool_choice
tool_choice cannot name a tool whose allowed_callers omits "direct". Either add "direct" to that tool's allowed_callers, or remove the tool from tool_choice and let Claude invoke it from code.Container expiration
expires_at timestamp. Claude's code stops waiting for a result after about 4 minutes, and idle containers are currently reclaimed after about 5 minutes.Tool result not parsed correctly
caller field to confirm programmatic invocationClaude is trained on large amounts of code, so presenting tools as callable Python functions lets it use that strength:
Programmatic tool calling is a generalizable pattern that can also be implemented on your own infrastructure. Here's how the approaches compare:
Provide Claude with a code execution tool and describe what functions are available in that environment. When Claude invokes the tool with code, your application executes it locally where those functions are defined.
Advantages:
Disadvantages:
Use when: Your application can safely execute arbitrary code, you want the smallest implementation, and Anthropic's managed offering doesn't fit your needs.
Same approach from Claude's perspective, but code runs in a sandboxed container with security restrictions (for example, no network egress). If your tools require external resources, you'll need a protocol for executing tool calls outside the sandbox.
Advantages:
Disadvantages:
Use when: Security is critical and Anthropic's managed solution doesn't fit your requirements.
Anthropic's programmatic tool calling is a managed version of sandboxed execution with an opinionated Python environment tuned for Claude. Anthropic handles container management, code execution, and secure tool invocation communication.
Advantages:
Consider using Anthropic's managed solution if you're using the Claude API, Claude Platform on AWS, or Microsoft Foundry.
Programmatic tool calling is built on the code execution infrastructure and uses the same sandbox containers. Container data, including execution artifacts and outputs, is retained for up to 30 days.
For ZDR eligibility across all features, see API and data retention.
Stream tool inputs without server-side JSON buffering for latency-sensitive applications.
Run Python and bash code in a sandboxed container to analyze data, generate files, and iterate on solutions.
Connect Claude to external tools and APIs. See where tools execute, when Claude calls them, and which tool fits your task.
Specify tool schemas, write effective descriptions, and control when Claude calls your tools.
Was this page helpful?