LangChain primitives backed by the Crawlbase Crawling API.
Three drop-in classes:
CrawlbaseLoader— aBaseLoaderthat fetches a list of URLs and returns clean MarkdownDocuments ready for chunking.CrawlbaseTool— aBaseToolthat lets an LLM agent fetch a live web page mid-conversation.CrawlbaseRetriever— aBaseRetrieverthat fetches a fixed set of seed URLs and filters by query.
All three return GitHub-flavored Markdown via Crawlbase's format=md parameter, so you skip the HTML-stripping step entirely.
pip install langchain-crawlbaseGet a token from your Crawlbase dashboard. Use your normal token for static pages, or your JavaScript token for SPA / JS-rendered pages — Crawlbase routes the request automatically based on which token you send.
export CRAWLBASE_TOKEN=your_tokenimport os
from langchain_crawlbase import CrawlbaseLoader
loader = CrawlbaseLoader(
token=os.environ["CRAWLBASE_TOKEN"],
urls=[
"https://en.wikipedia.org/wiki/Large_language_model",
"https://en.wikipedia.org/wiki/Retrieval-augmented_generation",
],
)
docs = loader.load()
print(docs[0].page_content[:500])
print(docs[0].metadata) # {'source': '...', 'pc_status': 200, ...}For SPA pages, just use your JavaScript token instead — same interface:
loader = CrawlbaseLoader(
token=os.environ["CRAWLBASE_JS_TOKEN"],
urls=["https://some-spa-site.com/page"],
)import os
from langchain_crawlbase import CrawlbaseTool
tool = CrawlbaseTool(token=os.environ["CRAWLBASE_TOKEN"])
# Use directly:
markdown = tool.invoke({"url": "https://example.com"})
# Or bind to an LLM:
from langchain_anthropic import ChatAnthropic
llm = ChatAnthropic(model="claude-opus-4-7")
agent_llm = llm.bind_tools([tool])import os
from langchain_crawlbase import CrawlbaseRetriever
retriever = CrawlbaseRetriever(
token=os.environ["CRAWLBASE_TOKEN"],
urls=[
"https://crawlbase.com/docs/crawling-api",
"https://crawlbase.com/docs/crawling-api#parameters",
],
)
docs = retriever.invoke("how do I render JavaScript pages")v0.1 uses simple substring matching. For semantic retrieval, pair
CrawlbaseLoaderwith a vector store of your choice.
Pass any Crawlbase API parameter via extra_params:
loader = CrawlbaseLoader(
token=token,
urls=["https://example.com"],
extra_params={"country": "US", "device": "mobile"},
)pip install -e ".[dev]"
pytest tests/unit
ruff check .Integration tests are gated on CRAWLBASE_TOKEN:
CRAWLBASE_TOKEN=xxx pytest tests/integrationMIT — © Crawlbase Team. Contact: support@crawlbase.com