Website Content Crawler

Crawl a small same-origin website slice and return clean page text, titles, final URLs, and discovered internal links for RAG, audits, and agent context.

Content extractionv0.1.0~3 credits/runApify StoreSource on GitHub

Overview

Website Content Crawler gives agents a controlled way to turn a public website into clean text. It starts from one URL, follows same-origin links up to a page budget, renders each page through Better Fetch, removes navigation-heavy markup, and returns pages that can be saved into RAG, audit, or monitoring workflows.

Last validated: Jul 3, 2026

Playground

Input

urlstring (uri)required

The starting page URL

max_pagesintegerdefault: 3

Maximum same-origin pages to fetch

max_chars_per_pageintegerdefault: 12000

Truncate each extracted page text to this many characters

include_path_prefixesstring[]

Optional path prefixes to follow, e.g. /docs or /blog

Output

countintegerrequired

Number of pages returned

pagesobject[]required

Fetched pages in crawl order

originstringrequired

Crawled origin

start_urlstringrequired

Original URL

Examples

example-domain

{
  "url": "https://example.com",
  "max_pages": 1,
  "max_chars_per_page": 2000
}

Use cases

RAG ingestion

Collect a small, targeted set of website pages and transform them into normalized text before loading them into a vector store or knowledge base.

Site audits

Let an agent inspect visible titles, final URLs, word counts, and internal links without manually copying content from each page.

FAQ

Does Website Content Crawler crawl the whole site?

No. It intentionally keeps a small page budget so agent calls stay predictable. Increase max_pages only for focused same-origin crawls.

Does the crawler follow external links?

No. It keeps traversal on the same origin as the starting URL and records external links only as page text or markup context when visible.

Use it anywhere

MCP (Claude, Cursor, any client)

# Add the Better Fetch MCP connector (or paste the URL into
# Claude → Settings → Connectors → Add custom connector):
claude mcp add --transport http better-fetch https://betterfetch.co/api/mcp \
  --header "Authorization: Bearer bf_your_key_here"

# Then ask for the tool by name: website_content_crawler

REST

curl -sS -X POST "https://betterfetch.co/api/tools/website_content_crawler/run" \
  -H "Authorization: Bearer bf_your_key_here" \
  -H "Content-Type: application/json" \
  -d '{"input": {"url":"https://example.com","max_pages":1,"max_chars_per_page":2000}}'

Run locally

git clone https://github.com/better-fetch/website-content-crawler && cd website-content-crawler && npm i
BETTER_FETCH_API_KEY=bf_your_key_here npx bf-tool run --input '{"url":"https://example.com","max_pages":1,"max_chars_per_page":2000}'