Firecrawl is a web scraping and data extraction platform designed to convert websites into LLM-ready structured data. The system provides a RESTful API, client SDKs, and a distributed worker architecture to handle scraping, crawling, searching, and AI-powered data extraction at scale.
This document provides a high-level overview of the Firecrawl system architecture, its core components, and how they interact.
Firecrawl solves the problem of extracting clean, structured web data for AI applications by providing:
| Capability | Endpoint | Description |
|---|---|---|
| Single Page Scraping | /v2/scrape | Convert any URL to markdown, HTML, screenshots, or structured JSON |
| Multi-Page Crawling | /v2/crawl | Recursively scrape entire websites with intelligent link filtering |
| URL Discovery | /v2/map | Discover all URLs on a website via sitemaps, index queries, or search |
| Web Search | /v2/search | Search the web and scrape results in a single API call |
| AI Extraction | /v2/extract | LLM-powered structured data extraction with schema validation |
| Autonomous Agent | /v2/agent | Autonomous research agent that navigates and extracts data |
| Remote Browser | /v2/browser | Remote browser sessions with CDP access and code execution |
| Batch Operations | /v2/batch/scrape | Asynchronous bulk scraping of multiple URLs |
The system is built to be self-hosted or used as a cloud service, with extensive configuration options for proxies, authentication, rate limiting, and content processing.
Sources: README.md40-83 apps/api/package.json1-159
Architecture Diagram: Core System Components and Data Flow
The architecture follows a client-server-worker pattern where:
parseApi, and job dispatching.Sources: apps/api/package.json24-39 apps/js-sdk/firecrawl/src/index.ts26-48 apps/python-sdk/firecrawl/client.py82-134
| Component | Technology | File Reference |
|---|---|---|
| API Server | Express 4.22.0 | apps/api/package.json118 |
| Runtime | Node.js >= 22.0.0 | apps/js-sdk/firecrawl/package.json63 |
| Language | TypeScript | apps/api/package.json152 |
| Process Manager | tsc-watch | apps/api/package.json9 |
| Component | Technology | Purpose |
|---|---|---|
| Job Queue | BullMQ 5.56.7 | Distributed job processing apps/api/package.json109 |
| Queue Backend | ioredis 5.6.1 | Redis client apps/api/package.json125 |
| Worker Orchestration | NuQ System | Specialized worker for high-throughput scraping apps/api/package.json26-33 |
| Database | PostgreSQL / pg 8.16.3 | Relational storage apps/api/package.json138 |
| Component | Technology | Purpose |
|---|---|---|
| AI SDK | Vercel AI SDK 6.0.86 | Unified LLM interface apps/api/package.json101 |
| Schema Validation | Zod 4.1.12 | Type-safe data validation apps/api/package.json158 |
| Providers | Anthropic, OpenAI, Google, Groq, etc. | Supported LLM providers apps/api/package.json79-86 |
Sources: apps/api/package.json78-159 apps/api/pnpm-lock.yaml1-240
Request Flow Diagram: Scrape Operation
parseApi to validate the Authorization header and extract user/team context apps/api/src/lib/parseApi.ts1-23scrape method is called (accessible via SDKs like Firecrawl.scrape) apps/python-sdk/firecrawl/v2/client.py136-162scrapeURL handles the logic of fetching the content and applying transformers.Document containing requested formats (markdown, html, json, etc.) is returned apps/python-sdk/firecrawl/v2/types.py60-130Sources: apps/js-sdk/firecrawl/src/index.ts27-48 apps/python-sdk/firecrawl/v2/client.py136-190 apps/python-sdk/firecrawl/v2/types.py60-130
Firecrawl provides official SDKs for major languages to simplify integration.
The @mendable/firecrawl-js package provides a unified Firecrawl class that supports both modern v2 methods and legacy v1 methods.
apps/js-sdk/firecrawl/src/index.ts apps/js-sdk/firecrawl/src/index.ts27-48FirecrawlClient apps/js-sdk/firecrawl/src/index.ts9FirecrawlAppV1 apps/js-sdk/firecrawl/src/index.ts17The firecrawl-py package provides synchronous (Firecrawl) and asynchronous (AsyncFirecrawl) clients.
apps/python-sdk/firecrawl/__init__.py apps/python-sdk/firecrawl/__init__.py9-18FirecrawlClient apps/python-sdk/firecrawl/v2/client.py82AsyncFirecrawlClient apps/python-sdk/firecrawl/v2/client_async.py67Sources: apps/js-sdk/firecrawl/package.json1-38 apps/python-sdk/firecrawl/__init__.py1-87
Firecrawl is designed for flexible deployment using Docker.
The system splits background tasks into multiple worker processes to ensure scalability:
Developers can run the full stack locally using the provided harness:
tsx src/harness.ts starts the environment for development apps/api/package.json8Sources: apps/api/package.json6-45 apps/test-suite/package.json1-12
Refresh this wiki