Skip to content

RFC: make Agents SDK core independent of AI SDK #1795

Description

@mattzcarey

Made with the help of agents.

Summary

The agents package should make Cloudflare's durable agent infrastructure independent of the Vercel AI SDK. AI SDK support should remain first class, but it should live behind an adapter entry point such as agents/ai, alongside adapters for other harnesses such as pi and TanStack AI.

This is not a proposal to replace AI SDK, build another provider SDK, or invent a universal model API. It is a proposal to make the package boundary match what Agents is: durable infrastructure for stateful agent workloads. Model invocation, provider-specific messages, and framework-specific stream protocols belong at adapters and higher-level harnesses.

The first milestone should be narrow:

  1. The agents root and infrastructure entry points compile, install, and run without ai.
  2. Agents owns the small tool and schema contracts needed by MCP, skills, agent tools, Codemode, and other infrastructure.
  3. agents/ai adapts those contracts to AI SDK and preserves today's AI SDK experience.
  4. A pi adapter and the existing pi recovery harness prove that the boundary works for a second, materially different stack.
  5. Think is addressed only after the Agents boundary is sound. Think currently embeds AI SDK's agent loop and should not be made "neutral" by renaming AI SDK types.

Motivation

The package describes itself as a persistent, stateful execution environment backed by Durable Objects. Its core capabilities include state, RPC, scheduling, fibers, recovery, MCP, workflows, sub-agents, WebSockets, and observability. None of those concepts requires a particular model provider or agent loop.

The current package boundary says otherwise:

  • ai is a required peer of agents, even for users who only import Agent, routing, state, scheduling, or MCP.
  • infrastructure APIs return AI SDK types, notably MCP tools, skill tools, session tools, and agent tools;
  • agents/chat mixes generic turn/recovery infrastructure with AI SDK UIMessage and UI stream details;
  • the package has already added separate TanStack adapters and a non-AI-SDK pi recovery harness, but the common contracts still use AI SDK as their default vocabulary;
  • Think's public and private API is tied directly to LanguageModel, ModelMessage, ToolSet, streamText, PrepareStepFunction, AI SDK stream parts, and UIMessage.

This makes AI SDK look like part of the Agents runtime rather than one harness on top of it. It also makes support for pi-ai, TanStack AI, or a custom loop harder than it needs to be.

There is user demand for another harness. #984 asks for TanStack AI support, with users specifically calling AI SDK migration a blocker. More importantly, the repository has already proved that shared durable recovery can work with pi's AgentEvent and transcript vocabulary in experimental/pi-recovery.

Current state

The package root is close to independent, but is not independent

packages/agents/src/index.ts has no runtime import from ai. It does, however, import MCPClientManager, whose source imports Zod at runtime to turn MCP JSON Schema into model-facing schemas and exposes AI SDK ToolSet as a public type. The root also exports agent-tool state whose parts field is UIMessage["parts"].

The package manifest still declares both ai and zod as required peers. Zod is a separate concern from AI SDK: the current MCP SDK v1.29.0 also declares Zod as a direct dependency and a required peer for protocol parsing and schema validation.

This means a counter agent still installs AI SDK, and the root declaration graph still needs AI SDK types. It also means a Zod-free install is not currently possible while @modelcontextprotocol/sdk@1.29.0 stays in the root package graph.

AI SDK types have spread into infrastructure surfaces

Examples:

The repository already uses better boundaries elsewhere:

These are precedents for the same change at the Agents package boundary.

Why the coupling existed

The coupling was not arbitrary. AI chat originally lived in agents, and MCP conversion used AI SDK's jsonSchema() helper. Moving AI SDK to peer dependencies in #722 exposed missing or duplicate dependency problems. ai was then made required in #754 to fix #751, where bundlers could not resolve a dynamic import("ai") from the main package graph.

The code has changed since then:

  • stable AI chat moved to @cloudflare/ai-chat;
  • MCP schema conversion moved from an AI SDK runtime import to Zod;
  • Session and recovery gained structural, framework-neutral seams;
  • Codemode demonstrated a plain tool that is still accepted by AI SDK.

The required peer solved a real packaging problem in 2025. It should not be treated as a permanent architectural constraint now that the package graph is different.

Design principles

  1. Agents owns infrastructure, not a model SDK. Durable state, execution, recovery, scheduling, transport, MCP, and sub-agent orchestration belong in core. streamText, pi streamSimple, model/provider options, and framework stream parts do not.
  2. Own only the contracts core needs. Do not copy all AI SDK types. Define small structural capabilities and keep provider or harness data opaque.
  3. Prefer structural compatibility over wrappers. A tool returned by core should be usable by AI SDK directly when the shapes line up, as Codemode's current tool is. Adapters should handle genuine semantic differences, not add ceremony for identical objects.
  4. Do not force one canonical transcript. AI SDK UIMessage, pi Message, and AG-UI have different roles, parts, metadata, tool state, and provider round-trip requirements. Generic infrastructure should be parameterized over message and event types or consume narrow capability interfaces.
  5. No silent lossy conversion. Provider metadata, reasoning signatures, tool approval state, images, sources, usage, finish reasons, and errors must either round-trip or be explicitly documented as unsupported by an adapter.
  6. Compatibility first. Existing AI SDK applications should have an incremental migration path. The initial work should not require a rewrite of @cloudflare/ai-chat or Think.

Proposed architecture

1. Core infrastructure

The following surfaces should have no runtime or declaration reference to ai:

  • agents
  • agents/types
  • agents/client
  • agents/react except for explicitly AI chat hooks, which already live in @cloudflare/ai-chat
  • agents/mcp and agents/mcp/client
  • agents/workflows
  • agents/schedule
  • agents/observability
  • agents/skills
  • the generic portion of agents/chat
  • agents/agent-tools

Zod is not the primary target of this RFC. The narrower goal is that Agents-owned neutral tool/schema contracts do not require Zod or manufacture Zod objects solely for an inference harness. Applications may keep using Zod, and the MCP SDK may keep it as an internal transitive dependency.

The current MCP SDK v1.29.0 requires consumers to install Zod. The split MCP SDK v2 alpha packages move user-facing tool and prompt schemas to Standard Schema and drop Zod from peerDependencies, but still keep Zod as a direct dependency for protocol message parsing. Upgrading to v2 can let agents stop making Zod a required peer; it does not produce a dependency tree with no Zod installed. A literally Zod-free install would require either a future Zod-free MCP SDK or moving MCP into a separately installed package. MCP v2 is still pre-release and v1 remains the recommended production version, so this RFC should align with v2's schema boundary without making an immediate v2 upgrade a prerequisite.

2. Owned schema and tool contracts

Agents needs a small set of its own types. Names are illustrative.

export type AgentJsonSchema = Record<string, unknown>;

export type AgentValidationResult<T> =
  | { success: true; value: T }
  | { success: false; issues: readonly AgentSchemaIssue[] };

export interface AgentSchema<T = unknown> {
  readonly jsonSchema: AgentJsonSchema | PromiseLike<AgentJsonSchema>;
  readonly validate?: (
    value: unknown
  ) => AgentValidationResult<T> | PromiseLike<AgentValidationResult<T>>;
}

export interface AgentToolExecutionContext {
  toolCallId?: string;
  signal?: AbortSignal;
}

export interface AgentTool<Input = unknown, Output = unknown> {
  description?: string;
  title?: string;
  inputSchema: AgentSchema<Input> | StandardSchemaLike<Input>;
  outputSchema?: AgentSchema<Output> | StandardSchemaLike<Output>;
  execute?: (
    input: Input,
    context: AgentToolExecutionContext
  ) => Output | Promise<Output>;
}

export type AgentToolSet = Record<string, AgentTool<any, any>>;

The exact schema shape needs a focused design pass. Requirements:

  • dependency-free at runtime;
  • accepts raw JSON Schema;
  • can validate when a validator is available;
  • can be implemented by Standard Schema without importing AI SDK;
  • preserves generic input/output inference;
  • can be converted to pi's TypeBox TSchema requirement;
  • is structurally accepted by AI SDK where possible.

This is intentionally smaller than AI SDK Tool. Fields such as providerOptions, toModelOutput, input streaming callbacks, and needsApproval are adapter or product concerns unless core itself needs them. We should not put every AI SDK field into the core contract to claim neutrality.

Tool names should remain keys in AgentToolSet. agents/pi-ai can convert the record to pi's array of { name, description, parameters }. This avoids baking either AI SDK's keyed tool set or pi's named tool array into every API.

Execution results also need adapter policy. AI SDK permits arbitrary outputs plus toModelOutput; pi tools return text/image content with UI details. Core should not choose one as universal. Built-in core tools can return plain values. Each adapter converts those values to its harness result format and exposes an explicit escape hatch for richer native results.

3. Adapters

agents/ai

This entry point should own Vercel AI SDK integration:

  • adapt AgentSchema and AgentToolSet to AI SDK;
  • AI SDK-specific tool features and result conversion;
  • AISDKRecoveryCodec;
  • UIMessage assembly, sanitization, transcript repair, and client tool conversion;
  • AI SDK UI stream protocol types and helpers;
  • compatibility aliases needed by @cloudflare/ai-chat and Think.

ai can remain an optional peer of the package because npm peers are package-wide, but it must be optional and referenced only by this entry point and deprecated AI-specific entry points.

agents/pi-ai

This adapter should start experimentally and prove two paths:

  • convert AgentToolSet to pi-ai/pi-agent-core tools, including schema validation and result conversion;
  • provide a pi recovery codec and a small durable host integration based on experimental/pi-recovery.

The adapter should target the current upstream packages, @earendil-works/pi-agent-core and @earendil-works/pi-ai, from earendil-works/pi. The existing experimental/pi-recovery harness already proves that these packages run in a Worker and can drive the shared durable recovery engine. The adapter should build on that proof. Provider-specific transport compatibility and the choice between upstream's base and batteries-included entry points still need ordinary bundle/runtime coverage, but they are not reasons to treat pi itself as an unknown fork.

Existing adapters

Keep and use the current pattern:

  • agents/browser/ai
  • agents/browser/tanstack-ai
  • @cloudflare/codemode/ai
  • @cloudflare/codemode/tanstack-ai

A later cleanup can make naming consistent. That should not block the core split.

4. Split generic chat infrastructure from AI SDK chat vocabulary

agents/chat currently contains both kinds of code.

Generic candidates include:

  • TurnQueue
  • SubmitConcurrencyController
  • AbortRegistry
  • ResumableStream
  • SQL batching
  • durable recovery incidents and budgets
  • ChatRecoveryEngine
  • resume scheduling and terminal replay policy
  • stall watchdog
  • narrow snapshot types such as { id?, role }

AI SDK-specific candidates include:

  • message-builder.ts
  • StreamAccumulator as currently typed
  • sanitize.ts
  • AI SDK transcript repair and reconciliation
  • createToolsFromClientSchemas()
  • lifecycle types that expose UIMessage and AI SDK message parts
  • AISDKRecoveryCodec

We should not move files only to create a new name. First make each generic module generic or narrow enough that its declaration graph is independent. Then expose a stable internal core subpath and have agents/ai assemble the AI SDK chat adapter. During migration, agents/chat can remain a compatibility barrel for @cloudflare/ai-chat and Think.

Changes by subsystem

MCP

Add a framework-neutral tool conversion API, for example getTools() or asTools(), returning AgentToolSet. Keep getAITools() as a deprecated compatibility alias for a release window if the returned object remains structurally AI SDK-compatible.

MCP already gives us JSON Schema. Agents should not convert that schema to Zod merely to produce an AI SDK tool. It should preserve the JSON Schema behind the neutral schema contract, using the configured MCP JSON Schema validator when runtime validation is needed. agents/ai and agents/pi-ai can then apply their native schema form.

The MCP SDK v2 alpha is directionally aligned with this RFC: its user-facing tool and prompt APIs accept StandardSchemaWithJSON, and fromJsonSchema() adapts raw JSON Schema. It still uses Zod internally for protocol parsing, so the useful boundary is "no Zod in Agents-owned tool APIs", not "no Zod anywhere in node_modules". The existing MCP v2 upgrade work can inform this design and later simplify the package metadata, but the neutral Agents tool contract does not need to wait for MCP v2.

Agent tools

agentTool() only needs description, input schema, toolCallId, abort, execution, and optional output validation. It should return AgentTool, not call AI SDK's identity tool() helper. The current durable child-run behavior is already independent of model invocation.

AgentToolRunState.parts should be generic or use an owned display-event type instead of UIMessage["parts"]. AI rendering can adapt it.

Skills

SkillRegistry.tools(), runner({ tools }), and sandbox tool dispatch should use AgentToolSet. Built-in schemas should be dependency-free Standard Schema or owned schemas instead of Zod solely for AI SDK tool construction.

Session

SessionMessage and SessionMessagePart are a good precedent. Finish the split by returning AgentToolSet from Session context tools. Do not turn SessionMessage into a complete universal model message.

Codemode and browser

Codemode's root tool is already the desired pattern: an owned plain tool, dependency-free schema, and an AI SDK conformance test. Reuse that pattern rather than creating a second incompatible tool contract.

Browser already has explicit AI SDK and TanStack adapter paths. Its framework-neutral connector and quick actions should consume the core tool contract where they need tools.

Think comes second

Think is not merely typed with AI SDK. It currently owns an AI SDK agent loop:

  • getModel() returns AI SDK LanguageModel;
  • getTools() returns ToolSet;
  • _runInferenceLoop() calls streamText() directly;
  • messages are converted with convertToModelMessages();
  • stop conditions use stepCountIs() and hasToolCall();
  • beforeStep, chunk, step-finish, and tool-finish hooks expose AI SDK callback types;
  • structured output, telemetry, transforms, provider options, and tool choice are passed through from AI SDK;
  • the durable transcript and UI protocol use UIMessage parts.

Trying to replace these names with broad owned types would copy AI SDK and still fail to represent pi accurately.

The staged Think direction should be:

  1. Use neutral tools internally. Think's built-in workspace, extension, MCP, session, skill, browser, and agent tools return AgentToolSet. The existing AI SDK loop adapts them at one boundary.
  2. Characterize the turn driver. Document what durable Think actually needs from an inference harness: input transcript, stream of model/tool events, abort, tool execution, usage/progress, finish/error, and continuation checkpoints.
  3. Extract an internal AI SDK driver. Move streamText setup and AI SDK callback plumbing behind a composition seam while preserving the existing Think public API.
  4. Build a pi spike. Use pi-ai or pi-agent-core through a second driver. Test text, reasoning, tools, abort, continuation, recovery, and provider metadata before deciding whether the seam is real.
  5. Choose the public API after the spike. Options include Think remaining the AI SDK product with a new ThinkCore, driver selection on a new base class, or separate @cloudflare/think/ai and /pi-ai entry points. Do not commit to one before the second driver works.

The minimal RFC should not promise a provider-neutral Think in its first implementation. The first deliverable is an AI-independent Agents core plus evidence that Think can consume neutral tools.

Migration plan

Phase 0: dependency and API map

  • Add a machine-readable allowlist of entry points and their framework dependencies.
  • Add import-graph checks over emitted JavaScript and declarations.
  • Add clean-room package tests that install the packed agents tarball with peers omitted.
  • Capture source compatibility tests for current AI SDK tool APIs.

Phase 1: owned schema and tool contract

  • Define AgentSchema, AgentTool, AgentToolSet, and execution context.
  • Reconcile this with Codemode's CodemodeTool so the repository has one structural convention.
  • Add conformance tests against AI SDK 6, pi-ai/pi-agent-core, MCP schemas, Zod 4, TypeBox, and raw JSON Schema.

Phase 2: leaf migrations

Move low-risk producers first:

  • Codemode runtime tools;
  • agentTool();
  • MCP tool conversion;
  • Session context tools;
  • skill tools and runner;
  • browser connector tool inputs.

Keep deprecated aliases where the same object remains structurally compatible.

Phase 3: package boundary

  • Add agents/ai.
  • Move or re-export AI SDK-specific helpers through it.
  • Mark ai optional in peerDependenciesMeta.
  • Ensure root JavaScript and declarations have no ai reference.
  • Remove Agents-owned Zod conversion from root paths where it exists only to create model-facing schemas.
  • Decide how to handle the intentionally public Zod surfaces that remain, especially agents/schedule's scheduleSchema and Zod-typed x402 helpers. They can move to owned/Standard Schema values, move behind compatibility entry points, or keep Zod as an optional peer for those subpaths.
  • Adopt the MCP v2 Standard Schema surface when that upgrade is production-viable, allowing agents to make Zod optional rather than required while tolerating it as an MCP-internal direct dependency.

Phase 4: chat boundary

  • Separate the generic chat/recovery modules from the AI SDK message codec.
  • Parameterize stores and recovery hooks over host message/part types where needed.
  • Preserve agents/chat as an internal compatibility barrel during the transition.
  • Keep the existing @cloudflare/ai-chat behavior and wire protocol unchanged.

Phase 5: pi adapter

  • Turn experimental/pi-recovery into adapter contract tests.
  • Add real tool execution, not only text recovery.
  • Verify clean Worker builds and measure bundle size for upstream's base and batteries-included entry points.
  • Test the provider transports Cloudflare intends to document as supported.

Phase 6: Think neutral tools

  • Adapt all built-in Think tools at the AI SDK loop boundary.
  • Keep Think and its public lifecycle hooks source-compatible.
  • Add tests proving a core tool can run through both Think/AI SDK and pi.

Phase 7: Think driver experiment

  • Extract the AI SDK turn driver internally.
  • Build a pi driver spike.
  • Write a follow-up RFC for the public Think API based on what the spike teaches us.

Compatibility and release strategy

This can be mostly additive until the final cleanup:

  • AI SDK remains fully supported.
  • Existing ToolSet-shaped return values should continue to work through structural typing.
  • Existing AI-specific subpaths remain during a deprecation window.
  • getAITools() can remain as an alias while getTools() lands.
  • @cloudflare/ai-chat and current Think users should not need to change in the first release.
  • Removing deprecated root exports or changing public AI SDK hook types belongs in a major release.

Making ai optional is safe only after emitted root declarations and JavaScript no longer reference it. Repeating the #751 failure with a type-only leak would not be acceptable.

Acceptance criteria

Agents core

  • A clean project can install a packed agents tarball with optional peer dependencies omitted, import the root, instantiate an Agent, use state/RPC/scheduling/fibers, and bundle a Worker without ai installed.
  • Consumers do not need to declare Zod merely to use Agents core. Zod may remain installed transitively by the MCP SDK until that SDK no longer uses it internally.
  • Emitted Agents JavaScript and declarations do not import Zod for the neutral tool/schema contract. Explicit Zod compatibility entry points and MCP's own internal dependency are outside this assertion.
  • Emitted JavaScript and .d.ts for agreed core entry points contain no import, re-export, or dynamic import of ai, @ai-sdk/*, @cloudflare/ai-chat, or pi packages.
  • Importing an adapter without its optional peer gives a clear error at that adapter boundary, not while importing agents.

Tools

  • One core tool executes through AI SDK streamText with input validation and typed output.
  • The same core tool executes through pi-agent-core with schema validation, abort propagation, and tool-call identity preserved.
  • MCP tools, skill tools, agent tools, and Codemode tools use the same core contract.
  • Adapter tests cover raw JSON Schema, Zod 4/Standard Schema, TypeBox/pi, invalid input, async validation, output conversion, and abort.

Chat and recovery

  • Existing AI chat recovery, approvals, client tools, provider metadata, reasoning signatures, and transcript repair tests remain green.
  • The pi recovery SIGKILL test remains green against the shared engine.
  • Generic recovery modules can typecheck in a fixture with neither AI SDK nor pi installed.

Think

  • Think's built-in tools are core tools adapted once at the inference boundary.
  • Existing Think applications remain source-compatible during the first migration.
  • No public claim of general Think driver support is made until the pi driver passes text, reasoning, tool, abort, recovery, usage, and metadata tests.

Risks

We accidentally rebuild AI SDK

The main failure mode is expanding AgentTool, messages, or turn drivers until they mirror every AI SDK field. The guardrail is simple: core gets a field only when durable infrastructure itself needs it. Everything else stays native to an adapter.

A lowest-common-denominator transcript loses important data

AI SDK provider metadata and reasoning signatures are required for valid round trips. Pi has its own provider/model/usage envelope and tool-result content. We should keep native transcripts and event vocabularies, then pass opaque data through generic infrastructure.

Structural compatibility is mistaken for semantic compatibility

Two tool objects can typecheck while differing on validation, approval, abort, result formatting, or error behavior. Cross-adapter conformance tests must assert behavior, not only assignment.

Package subpaths do not fully isolate peers

Peer dependencies are declared package-wide. The package can still use optional peers for adapter subpaths, but clean-room tests must check npm, pnpm, bundlers, JavaScript, and declarations. Codemode #1791 is the precedent.

MCP still brings Zod transitively

MCP SDK v2 removes Zod from its peer dependencies and accepts Standard Schema for user-facing tool and prompt schemas, but it still uses Zod internally for protocol parsing. This RFC can remove Zod from Agents-owned neutral APIs and stop requiring every user to declare it. Zod may remain an optional peer for explicit Zod compatibility subpaths. The RFC cannot promise a dependency tree with no Zod unless MCP removes that runtime dependency or MCP moves to a separately installed package.

Provider transports still need Worker coverage

The existing pi recovery harness proves upstream @earendil-works/pi-agent-core and @earendil-works/pi-ai work in the Workers runtime. It uses a deterministic faux provider. The production adapter should still test the selected upstream entry point and each supported provider transport in Workers, especially where a provider SDK has Node-specific code. This is a provider compatibility matrix, not uncertainty about which pi project is upstream.

Think's public hooks are AI SDK APIs

That is expected. Compatibility wrappers may leave Think AI SDK-specific for some time. The RFC should not trade a clear AI SDK product for a fake generic API.

Alternatives considered

Keep AI SDK required and add one-off pi helpers

This is the smallest change, but it leaves infrastructure APIs and installation tied to AI SDK. Every future harness would need to adapt around AI SDK types.

Make ai optional without changing declarations

This repeats the class of failure from #751. Optional metadata is packaging, not architecture.

Use AI SDK types as the neutral contract

Pi and TanStack would remain second-class, and core would still change when AI SDK changes. This does not meet the goal.

Define a universal model and message API now

This is too broad. Provider options, prompt conversion, tool semantics, stream events, usage, finish reasons, and UI parts differ enough that a useful universal API needs evidence from working adapters. Start with tools and durable infrastructure seams.

Make Think generic first

Think's 14,000-line main implementation directly orchestrates AI SDK streamText. Starting there mixes package-boundary work with an agent-loop rewrite. Agents core should go first.

Open questions

  1. Should the adapter be agents/ai, agents/ai-sdk, or a separate package? agents/ai matches the requested direction, while a separate package gives stronger dependency isolation.
  2. Should core directly model Standard Schema, depend on @standard-schema/spec as types, or expose a smaller AgentSchema with adapters for Standard Schema?
  3. Which current agents/chat exports are public commitments versus sibling-package internals that can move without a major release?
  4. Can getAITools() return the new structural AgentToolSet without breaking inference for existing users, or should it remain in an AI adapter until the next major?
  5. Should the pi adapter use upstream's base entry points and register selected providers explicitly, or use the batteries-included entry points? The existing Worker recovery proof uses @earendil-works/*; production provider and bundle-size tests should decide the entry point.
  6. Does a future neutral Think base own the agent loop, or should each driver own its native loop? The pi spike should answer this.
  7. Should @cloudflare/ai-chat eventually consume agents/ai only, leaving agents/chat entirely generic?

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions