Cheerio Scraper
Scrape one raw HTML page over HTTP into structured metadata, readable text, links, images, JSON-LD types, response metadata, and optional simple selector matches.
Overview
Cheerio Scraper is the fast HTTP-first extractor in the Better Fetch marketplace. Provide a public URL and the tool retrieves the raw response without launching a browser, then normalizes page metadata, readable text, link and image inventories, JSON-LD types, content type, status, byte size, and lightweight selector matches.
Last validated: Jul 3, 2026
Playground
Input
urlstring (uri)requiredThe public http(s) page URL to scrape over raw HTTP.
max_linksintegerdefault: 25Maximum links to return.
selectorsstring[]Optional simple selectors to extract text from, e.g. h1, main, article, .price, #content.
max_imagesintegerdefault: 10Maximum images to return.
max_text_charsintegerdefault: 12000Maximum readable text characters to return.
Output
urlstringrequiredRequested URL
textstringrequiredReadable page text
linksobject[]Links discovered in page order
titlestringrequiredPage title
imagesobject[]Images discovered in page order
statusintegerHTTP status code returned by Better Fetch
final_urlstringrequiredFinal URL after redirects
html_bytesintegerRaw HTML byte length
word_countintegerrequiredWords in extracted text
descriptionstringMeta description
content_typestringResponse Content-Type header when available
canonical_urlstringCanonical page URL
json_ld_typesstring[]Structured data @type values
selector_resultsobject[]Optional selector text matches
Examples
example-page
{
"url": "https://example.com",
"max_links": 10,
"selectors": [
"h1",
"p"
],
"max_images": 5,
"max_text_chars": 2000
}Use cases
FAQ
How is Cheerio Scraper different from Web Scraper?
Cheerio Scraper uses raw HTTP and is faster for static HTML pages. Web Scraper uses the browser path and is better for pages that need client-side JavaScript rendering.
Does this version crawl recursively?
No. Version 0.1 focuses on one page per run. Recursive queues, glob filters, pseudo-URLs, and enqueue behavior should be added as separately validated slices.
Can it run arbitrary page functions?
No. It returns a fixed, safe structured extraction. Custom Node page functions are intentionally out of scope for this first marketplace version.
Use it anywhere
MCP (Claude, Cursor, any client)
# Add the Better Fetch MCP connector (or paste the URL into # Claude → Settings → Connectors → Add custom connector): claude mcp add --transport http better-fetch https://betterfetch.co/api/mcp \ --header "Authorization: Bearer bf_your_key_here" # Then ask for the tool by name: cheerio_scraper
REST
curl -sS -X POST "https://betterfetch.co/api/tools/cheerio_scraper/run" \
-H "Authorization: Bearer bf_your_key_here" \
-H "Content-Type: application/json" \
-d '{"input": {"url":"https://example.com","max_links":10,"selectors":["h1","p"],"max_images":5,"max_text_chars":2000}}'Run locally
git clone https://github.com/better-fetch/cheerio-scraper && cd cheerio-scraper && npm i
BETTER_FETCH_API_KEY=bf_your_key_here npx bf-tool run --input '{"url":"https://example.com","max_links":10,"selectors":["h1","p"],"max_images":5,"max_text_chars":2000}'