Website Content Crawler
Crawl a small same-origin website slice and return clean page text, titles, final URLs, and discovered internal links for RAG, audits, and agent context.
Overview
Website Content Crawler gives agents a controlled way to turn a public website into clean text. It starts from one URL, follows same-origin links up to a page budget, renders each page through Better Fetch, removes navigation-heavy markup, and returns pages that can be saved into RAG, audit, or monitoring workflows.
Last validated: Jul 3, 2026
Playground
Input
urlstring (uri)requiredThe starting page URL
max_pagesintegerdefault: 3Maximum same-origin pages to fetch
max_chars_per_pageintegerdefault: 12000Truncate each extracted page text to this many characters
include_path_prefixesstring[]Optional path prefixes to follow, e.g. /docs or /blog
Output
countintegerrequiredNumber of pages returned
pagesobject[]requiredFetched pages in crawl order
originstringrequiredCrawled origin
start_urlstringrequiredOriginal URL
Examples
example-domain
{
"url": "https://example.com",
"max_pages": 1,
"max_chars_per_page": 2000
}Use cases
FAQ
Does Website Content Crawler crawl the whole site?
No. It intentionally keeps a small page budget so agent calls stay predictable. Increase max_pages only for focused same-origin crawls.
Does the crawler follow external links?
No. It keeps traversal on the same origin as the starting URL and records external links only as page text or markup context when visible.
Use it anywhere
MCP (Claude, Cursor, any client)
# Add the Better Fetch MCP connector (or paste the URL into # Claude → Settings → Connectors → Add custom connector): claude mcp add --transport http better-fetch https://betterfetch.co/api/mcp \ --header "Authorization: Bearer bf_your_key_here" # Then ask for the tool by name: website_content_crawler
REST
curl -sS -X POST "https://betterfetch.co/api/tools/website_content_crawler/run" \
-H "Authorization: Bearer bf_your_key_here" \
-H "Content-Type: application/json" \
-d '{"input": {"url":"https://example.com","max_pages":1,"max_chars_per_page":2000}}'Run locally
git clone https://github.com/better-fetch/website-content-crawler && cd website-content-crawler && npm i
BETTER_FETCH_API_KEY=bf_your_key_here npx bf-tool run --input '{"url":"https://example.com","max_pages":1,"max_chars_per_page":2000}'