|
1 | 1 | # Cloudflare Pipelines |
2 | 2 |
|
3 | | -ETL streaming platform for ingesting, transforming, and loading data into R2 with SQL transformations. |
| 3 | +Streaming ingest: receive events over HTTP/Workers/Logpush, transform with SQL, write to R2 as Iceberg tables or Parquet/JSON files. |
4 | 4 |
|
5 | | -## Overview |
| 5 | +## Documentation |
6 | 6 |
|
7 | | -Pipelines provides: |
8 | | -- **Streams**: Durable event buffers (HTTP/Workers ingestion) |
9 | | -- **Pipelines**: SQL-based transformations |
10 | | -- **Sinks**: R2 destinations (Iceberg tables or Parquet/JSON files) |
| 7 | +This reference is a fast-start with verified code and gotchas. For limits, settings, full SQL syntax, and pricing, **retrieve the live docs** — use the `cloudflare-docs` MCP/search tool if available, otherwise `webfetch` the URL. Docs are source of truth over this file. |
11 | 8 |
|
12 | | -**Status**: Open beta (Workers Paid plan) |
13 | | -**Pricing**: No charge beyond standard R2 storage/operations |
| 9 | +| Topic | URL | |
| 10 | +|-------|-----| |
| 11 | +| Overview / getting started | `https://developers.cloudflare.com/pipelines/getting-started/` | |
| 12 | +| Streams (write, manage, Logpush) | `https://developers.cloudflare.com/pipelines/streams/` | |
| 13 | +| Sinks | `https://developers.cloudflare.com/pipelines/sinks/` | |
| 14 | +| Pipelines & SQL transforms | `https://developers.cloudflare.com/pipelines/pipelines/` | |
| 15 | +| SQL reference (statements, types) | `https://developers.cloudflare.com/pipelines/sql-reference/` | |
| 16 | +| Wrangler commands | `https://developers.cloudflare.com/pipelines/reference/wrangler-commands/` | |
| 17 | +| Terraform | `https://developers.cloudflare.com/pipelines/reference/terraform/` | |
| 18 | +| Limits | `https://developers.cloudflare.com/pipelines/platform/limits/` | |
| 19 | +| Pricing | `https://developers.cloudflare.com/pipelines/platform/pricing/` | |
| 20 | +| Metrics (GraphQL) | `https://developers.cloudflare.com/pipelines/observability/metrics/` | |
14 | 21 |
|
15 | | -## Architecture |
| 22 | +## Three Components |
16 | 23 |
|
17 | 24 | ``` |
18 | | -Data Sources → Streams → Pipelines (SQL) → Sinks → R2 |
19 | | - ↑ ↓ ↓ |
20 | | - HTTP/Workers Transform Iceberg/Parquet |
| 25 | +Sources → Stream → Pipeline (SQL) → Sink → R2 |
| 26 | + ↑ ↓ ↓ |
| 27 | + HTTP / Workers / Transform Iceberg (Data Catalog) |
| 28 | + Logpush (row-level) or Parquet/JSON files |
21 | 29 | ``` |
22 | 30 |
|
23 | | -| Component | Purpose | Key Feature | |
24 | | -|-----------|---------|-------------| |
25 | | -| Streams | Event ingestion | Structured (validated) or unstructured | |
26 | | -| Pipelines | Transform with SQL | Immutable after creation | |
27 | | -| Sinks | Write to R2 | Exactly-once delivery | |
| 31 | +| Component | Purpose | |
| 32 | +|-----------|---------| |
| 33 | +| **Stream** | Receives events (HTTP endpoint, Worker binding, or Logpush). Structured (schema-validated) or unstructured. | |
| 34 | +| **Pipeline** | SQL connecting a stream to a sink. Row-level transforms only — no GROUP BY/aggregation. | |
| 35 | +| **Sink** | Writes to R2 — Iceberg via Data Catalog, or raw Parquet/JSON. | |
| 36 | + |
| 37 | +**Status:** Open beta (Workers Paid for production). Pricing announced; verify billing status in docs. |
28 | 38 |
|
29 | 39 | ## Quick Start |
30 | 40 |
|
31 | 41 | ```bash |
32 | | -# Interactive setup (recommended) |
| 42 | +# Interactive — creates stream + sink + pipeline, optionally bucket + catalog |
33 | 43 | npx wrangler pipelines setup |
34 | 44 | ``` |
35 | 45 |
|
36 | | -**Minimal Worker example:** |
| 46 | +Minimal Worker producer: |
37 | 47 | ```typescript |
38 | | -interface Env { |
39 | | - STREAM: Pipeline; |
40 | | -} |
| 48 | +interface Env { MY_STREAM: Pipeline; } |
41 | 49 |
|
42 | 50 | export default { |
43 | | - async fetch(request: Request, env: Env, ctx: ExecutionContext): Promise<Response> { |
44 | | - const event = { user_id: "123", event_type: "purchase", amount: 29.99 }; |
45 | | - |
46 | | - // Fire-and-forget pattern |
47 | | - ctx.waitUntil(env.STREAM.send([event])); |
48 | | - |
49 | | - return new Response('OK'); |
| 51 | + async fetch(req: Request, env: Env, ctx: ExecutionContext): Promise<Response> { |
| 52 | + ctx.waitUntil(env.MY_STREAM.send([{ event_id: crypto.randomUUID(), amount: 29.99 }])); |
| 53 | + return new Response("OK"); |
50 | 54 | } |
51 | 55 | } satisfies ExportedHandler<Env>; |
52 | 56 | ``` |
53 | 57 |
|
54 | 58 | ## Which Sink Type? |
55 | 59 |
|
56 | 60 | ``` |
57 | | -Need SQL queries on data? |
58 | | - → R2 Data Catalog (Iceberg) |
59 | | - ✅ ACID transactions, time-travel, schema evolution |
60 | | - ❌ More setup complexity (namespace, table, catalog token) |
61 | | -
|
62 | | -Just file storage/archival? |
63 | | - → R2 Storage (Parquet) |
64 | | - ✅ Simple, direct file access |
65 | | - ❌ No built-in SQL queries |
66 | | -
|
67 | | -Using external tools (Spark/Athena)? |
68 | | - → R2 Storage (Parquet with partitioning) |
69 | | - ✅ Standard format, partition pruning for performance |
70 | | - ❌ Must manage schema compatibility yourself |
71 | | -``` |
72 | | - |
73 | | -## Common Use Cases |
| 61 | +Need SQL queries / ACID / time-travel on the data? |
| 62 | + → R2 Data Catalog (Iceberg) ✅ R2 SQL, schema evolution ❌ more setup |
74 | 63 |
|
75 | | -- **Analytics pipelines**: Clickstream, telemetry, server logs |
76 | | -- **Data warehousing**: ETL into queryable Iceberg tables |
77 | | -- **Event processing**: Mobile/IoT with enrichment |
78 | | -- **Ecommerce analytics**: User events, purchases, views |
| 64 | +Just archival / external tools (Spark, Athena)? |
| 65 | + → R2 raw files (Parquet/JSON) ✅ simple, partitioned files ❌ no built-in SQL |
| 66 | +``` |
79 | 67 |
|
80 | | -## Reading Order |
| 68 | +## Critical Behaviors (read before building) |
81 | 69 |
|
82 | | -**New to Pipelines?** Start here: |
83 | | -1. [configuration.md](./configuration.md) - Setup streams, sinks, pipelines |
84 | | -2. [api.md](./api.md) - Send events, TypeScript types, SQL functions |
85 | | -3. [patterns.md](./patterns.md) - Best practices, integrations, complete example |
86 | | -4. [gotchas.md](./gotchas.md) - Critical warnings, troubleshooting |
| 70 | +These are non-obvious and prevent most failures — see [gotchas.md](gotchas.md) for detail. |
87 | 71 |
|
88 | | -**Task-based routing:** |
89 | | -- Setup pipeline → [configuration.md](./configuration.md) |
90 | | -- Send/query data → [api.md](./api.md) |
91 | | -- Implement pattern → [patterns.md](./patterns.md) |
92 | | -- Debug issue → [gotchas.md](./gotchas.md) |
| 72 | +- **Everything is immutable after creation** — stream schema, pipeline SQL, sink config. To change, delete and recreate. |
| 73 | +- **Sinks create their own table** — they cannot target an existing Iceberg table. |
| 74 | +- **`__ingest_ts` is added automatically** (TIMESTAMP, partitioned by day). Don't define it in your schema. |
| 75 | +- **Data isn't queryable immediately** — first flush takes **3–7 minutes** (warm-up + table creation) even with a short roll interval. |
| 76 | +- **Schema validation is deferred** — invalid events are accepted then silently dropped. Monitor via GraphQL error metrics. |
| 77 | +- **Binding field renamed `pipeline` → `stream`** (June 2026); old field still accepted. |
93 | 78 |
|
94 | | -## In This Reference |
| 79 | +## Reading Order |
95 | 80 |
|
96 | | -- [configuration.md](./configuration.md) - wrangler.jsonc bindings, schema definition, sink options, CLI commands |
97 | | -- [api.md](./api.md) - Pipeline binding interface, send() method, HTTP ingest, SQL function reference |
98 | | -- [patterns.md](./patterns.md) - Fire-and-forget, schema validation with Zod, integrations, performance tuning |
99 | | -- [gotchas.md](./gotchas.md) - Silent validation failures, immutable pipelines, latency expectations, limits |
| 81 | +1. [configuration.md](configuration.md) — schema, streams, sinks, pipelines (CLI + REST + Terraform), bindings |
| 82 | +2. [api.md](api.md) — `send()`, HTTP ingest, REST API, pipeline SQL, lifecycle states |
| 83 | +3. [patterns.md](patterns.md) — fire-and-forget, validation, Logpush, observability, end-to-end |
| 84 | +4. [gotchas.md](gotchas.md) — silent drops, immutability, REST≠CLI field names |
100 | 85 |
|
101 | 86 | ## See Also |
102 | 87 |
|
103 | | -- [r2](../r2/) - R2 storage backend for sinks |
104 | | -- [queues](../queues/) - Compare with Queues for async processing |
105 | | -- [workers](../workers/) - Worker runtime for event ingestion |
| 88 | +- [r2-data-catalog](../r2-data-catalog/) — Iceberg sink destination |
| 89 | +- [r2-sql](../r2-sql/) — query the ingested data |
| 90 | +- [r2](../r2/) · [queues](../queues/) · [workers](../workers/) |
0 commit comments