Think
@cloudflare/think lets you build a stateful AI chat agent — one that streams replies, remembers the conversation, and calls tools — by extending a single base class. You provide a model with getModel(), and Think wires up the rest of the chat lifecycle for you: the agentic loop (the model calls tools, reads the results, and keeps going until it has an answer), message persistence, streaming, client tools, stream resumption, and extensions — all backed by Durable Object SQLite.
Think works as both a top-level agent (WebSocket chat to browser clients via useAgentChat) and a sub-agent (a child agent that another agent drives over RPC via chat()).
npm install @cloudflare/think @cloudflare/ai-chat agents ai @cloudflare/shell zod workers-ai-providerimport { Think } from "@cloudflare/think";import { createWorkersAI } from "workers-ai-provider";import { routeAgentRequest } from "agents";
export class MyAgent extends Think { getModel() { return createWorkersAI({ binding: this.env.AI })( "@cf/moonshotai/kimi-k2.6", ); }}
export default { async fetch(request, env) { return ( (await routeAgentRequest(request, env)) || new Response("Not found", { status: 404 }) ); },};import { Think } from "@cloudflare/think";import { createWorkersAI } from "workers-ai-provider";import { routeAgentRequest } from "agents";
export class MyAgent extends Think<Env> { getModel() { return createWorkersAI({ binding: this.env.AI })( "@cf/moonshotai/kimi-k2.6", ); }}
export default { async fetch(request: Request, env: Env) { return ( (await routeAgentRequest(request, env)) || new Response("Not found", { status: 404 }) ); },} satisfies ExportedHandler<Env>;That is it. Think handles the WebSocket chat protocol, message persistence, the agentic loop, message sanitization, stream resumption, client tool support, and workspace file tools.
import { useAgent } from "agents/react";import { useAgentChat } from "@cloudflare/ai-chat/react";
function Chat() { const agent = useAgent({ agent: "MyAgent" }); const { messages, sendMessage, status } = useAgentChat({ agent });
return ( <div> {messages.map((msg) => ( <div key={msg.id}> <strong>{msg.role}:</strong> {msg.parts.map((part, i) => part.type === "text" ? <span key={i}>{part.text}</span> : null, )} </div> ))}
<form onSubmit={(e) => { e.preventDefault(); const input = e.currentTarget.elements.namedItem("input"); sendMessage({ text: input.value }); input.value = ""; }} > <input name="input" placeholder="Send a message..." /> <button type="submit">Send</button> </form> </div> );}import { useAgent } from "agents/react";import { useAgentChat } from "@cloudflare/ai-chat/react";
function Chat() { const agent = useAgent({ agent: "MyAgent" }); const { messages, sendMessage, status } = useAgentChat({ agent });
return ( <div> {messages.map((msg) => ( <div key={msg.id}> <strong>{msg.role}:</strong> {msg.parts.map((part, i) => part.type === "text" ? <span key={i}>{part.text}</span> : null, )} </div> ))}
<form onSubmit={(e) => { e.preventDefault(); const input = e.currentTarget.elements.namedItem( "input", ) as HTMLInputElement; sendMessage({ text: input.value }); input.value = ""; }} > <input name="input" placeholder="Send a message..." /> <button type="submit">Send</button> </form> </div> );}{ "$schema": "./node_modules/wrangler/config-schema.json", // Set this to today's date "compatibility_date": "2026-06-27", "compatibility_flags": [ "nodejs_compat" ], "ai": { "binding": "AI" }, "durable_objects": { "bindings": [ { "class_name": "MyAgent", "name": "MyAgent" } ] }, "migrations": [ { "new_sqlite_classes": [ "MyAgent" ], "tag": "v1" } ]}# Set this to today's datecompatibility_date = "2026-06-27"compatibility_flags = ["nodejs_compat"]
[ai]binding = "AI"
[[durable_objects.bindings]]class_name = "MyAgent"name = "MyAgent"
[[migrations]]new_sqlite_classes = ["MyAgent"]tag = "v1"Both Think and AIChatAgent extend Agent and speak the same cf_agent_chat_* WebSocket protocol. They serve different goals.
AIChatAgent is a protocol adapter. You override onChatMessage and are responsible for calling streamText, wiring tools, converting messages, and returning a Response. AIChatAgent handles the plumbing — message persistence, streaming, abort, resume — but the LLM call is entirely your concern.
Think is an opinionated framework. It makes decisions for you: getModel() returns the model, getSystemPrompt() or configureSession() sets the prompt, getTools() returns tools. The default onChatMessage runs the complete agentic loop. You override individual pieces, not the whole pipeline.
| Concern | AIChatAgent | Think |
|---|---|---|
| Minimal subclass | ~15 lines (wire streamText + tools + system prompt + response) | 3 lines (getModel() only) |
| Storage | Flat SQL table | Session: tree-structured messages, context blocks, compaction, FTS5 |
| Regeneration | Destructive (old response deleted) | Non-destructive branching (old responses preserved) |
| Context management | Manual | Context blocks with LLM-writable persistent memory |
| Sub-agent RPC | Not built in | chat() with StreamCallback |
| Programmatic turns | saveMessages() | saveMessages(), submitMessages(), continueLastTurn() |
| Compaction | maxPersistedMessages (deletes oldest) | Non-destructive summaries via overlays |
| Search | Not available | FTS5 full-text search per-session and cross-session |
- You need full control over the LLM call (RAG, multi-model, custom streaming)
- You want the
Responsereturn type for HTTP middleware or testing - You are building a simple chatbot with no memory requirements
- You want to ship fast (3-line subclass with everything wired)
- You need persistent memory (context blocks the model can read and write)
- You need long conversations (non-destructive compaction)
- You need conversation search (FTS5)
- You are building a sub-agent system (parent-child RPC with streaming)
- You need proactive agents (programmatic turns from scheduled tasks or webhooks)
- You need durable async submission for webhook or RPC callers
Think has several ways to start or continue a turn. They all funnel through one public entry point — runTurn(options) — and the older methods remain as convenience shortcuts.
runTurn() is the unified turn-admission API. One method, three modes, selected by options.mode:
| Mode | Use when | Returns | Shortcut for |
|---|---|---|---|
"wait" (default) | The caller can block until the model response is finished | Promise<TurnResult> | saveMessages() |
"submit" | The caller needs fast, durable acceptance and a later status | Promise<SubmitMessagesResult> | submitMessages() |
"stream" | The caller wants the response streamed to a callback (RPC) | Promise<void> | chat() |
The input accepts a string, a UIMessage, an array of messages, or — in wait and stream modes — a function (current) => UIMessage[] evaluated at admission. (submit does not accept function input.)
export class Assistant extends Think { async examples(inboundEventId) { // wait — block for the result const result = await this.runTurn({ input: "Summarize the latest thread" }); if (result.status === "completed") { // result.message is the assistant message; result.continuation is false }
// submit — durable acceptance, check status later const submission = await this.runTurn({ mode: "submit", input: "Process this webhook", idempotencyKey: inboundEventId, // dedupe; safe to retry }); // submission.accepted is true on first accept; submission.status is "pending"
// stream — drive a callback (the same surface as chat()) await this.runTurn({ mode: "stream", input: "Stream me", callback: { onStart({ requestId }) {}, onEvent(json) {}, // UIMessageChunk JSON onDone() {}, onError(error) {}, }, });
// continuation — continue the last assistant turn instead of sending input await this.runTurn({ continuation: true }); }}export class Assistant extends Think<Env> { async examples(inboundEventId: string) { // wait — block for the result const result = await this.runTurn({ input: "Summarize the latest thread" }); if (result.status === "completed") { // result.message is the assistant message; result.continuation is false }
// submit — durable acceptance, check status later const submission = await this.runTurn({ mode: "submit", input: "Process this webhook", idempotencyKey: inboundEventId, // dedupe; safe to retry }); // submission.accepted is true on first accept; submission.status is "pending"
// stream — drive a callback (the same surface as chat()) await this.runTurn({ mode: "stream", input: "Stream me", callback: { onStart({ requestId }) {}, onEvent(json) {}, // UIMessageChunk JSON onDone() {}, onError(error) {}, }, });
// continuation — continue the last assistant turn instead of sending input await this.runTurn({ continuation: true }); }}Key behaviors:
- Blocking modes cannot nest. Calling
wait/stream/continuation(or the equivalent shortcut) from inside an active turn — for example, from a tool'sexecute— throws, because it would deadlock the turn queue. From inside a turn, userunTurn({ mode: "submit" })(durable, runs after the current turn frees the queue) oraddMessages()(transcript only, no inference). submitis idempotent. PasssubmissionIdand/oridempotencyKey; re-submitting a known key returns the existing record withaccepted: falseinstead of starting a second turn. See Programmatic submissions.- Recovery-safe. When
chatRecoveryis enabled, thewait,stream, and drainedsubmitpaths all run inference inside a recovery fiber, so an interrupted turn resumes after eviction.
runTurn is exported alongside its option and result types: RunTurnOptions, RunTurnWait, RunTurnSubmit, RunTurnStream, TurnInputMessages, and TurnResult.
The table below maps each scenario to the most direct call. Each shortcut has an unchanged signature; reach for them when you want the narrower surface, or use runTurn() when you want one mental model.
| Use case | API |
|---|---|
| A browser user sends chat messages | useAgentChat over the WebSocket chat protocol |
| Server code can wait for the model response | saveMessages() |
| Server code needs fast durable acceptance and later status | submitMessages() |
| Code should create recurring prompt-driven turns or handlers | getScheduledTasks() |
| Parent code needs direct streaming RPC to a specific child | subAgent(...).chat() |
| A parent delegates work to a retained child agent | agentTool() or runAgentTool() |
| Surround a turn with idempotent app-owned side effects | startFiber() |
| Coordinate multi-step durable orchestration | Workflows |
| Add context or messages without starting a model turn | addMessages() |
| Advanced subclass or recovery code continues an assistant turn | continueLastTurn() |
Use saveMessages() when the caller owns the trigger and can wait for the turn to finish. Use submitMessages() when timeout ambiguity would make retries unsafe.
Use addMessages() to write to the transcript without starting a model turn — for importing prior history or injecting background context the next turn should see:
export class Assistant extends Think { async importContext() { await this.addMessages([ { id: crypto.randomUUID(), role: "user", parts: [{ type: "text", text: "Imported context" }], }, ]); }}export class Assistant extends Think<Env> { async importContext() { await this.addMessages([ { id: crypto.randomUUID(), role: "user", parts: [{ type: "text", text: "Imported context" }], }, ]); }}addMessages() appends (or upserts) into the Session tree:
- It does not run inference and does not enter the turn queue, so it is safe to call from inside a tool's
executewithout deadlocking. - Array entries are appended linearly (each attaches under the previous one), so imported history stays a single path. By default the first message attaches to the latest committed leaf; pass
parentIdto attach elsewhere, ornullfor a root message. - Appends are idempotent by message id. Pass
{ mode: "upsert" }to update an existing message in place instead.
The supported pattern is "add context, then run a turn": call addMessages(), then runTurn().
Use chat() for low-level parent-to-child streaming when your code owns forwarding, cancellation, and replay policy. Use Agents as tools when a parent model or workflow delegates to a child agent and you want retained child runs, event replay, abort bridging, and UI drill-in.
Use startFiber() outside Think when the durable unit is an application job around a turn: accepting a webhook once, restoring a serialized channel or thread target, posting a visible reply, or recording app-level recovery policy. Think submissions own conversation admission and turn serialization; managed fibers own external job acceptance, idempotent side effects, and application recovery.
Think's design is inspired by Pi ↗.
- Sessions — context blocks, compaction, search, multi-session (the storage layer Think builds on)
- Sub-agents —
subAgent(),abortSubAgent(),deleteSubAgent()(the base Agent methods for spawning children) - Chat agents —
AIChatAgentfor when you need full control over the LLM call - Long-running agents — sub-agent delegation patterns for multi-week agent lifetimes
- Durable execution —
runFiber()and crash recovery (used bychatRecovery) - Browse the web — full CDP helper API reference