Cheerio Scraper

Scrape one raw HTML page over HTTP into structured metadata, readable text, links, images, JSON-LD types, response metadata, and optional simple selector matches.

Developer Toolsv0.1.0~1 credit/runApify StoreSource on GitHub

Overview

Cheerio Scraper is the fast HTTP-first extractor in the Better Fetch marketplace. Provide a public URL and the tool retrieves the raw response without launching a browser, then normalizes page metadata, readable text, link and image inventories, JSON-LD types, content type, status, byte size, and lightweight selector matches.

Last validated: Jul 3, 2026

Playground

Input

urlstring (uri)required

The public http(s) page URL to scrape over raw HTTP.

max_linksintegerdefault: 25

Maximum links to return.

selectorsstring[]

Optional simple selectors to extract text from, e.g. h1, main, article, .price, #content.

max_imagesintegerdefault: 10

Maximum images to return.

max_text_charsintegerdefault: 12000

Maximum readable text characters to return.

Output

urlstringrequired

Requested URL

textstringrequired

Readable page text

linksobject[]

Links discovered in page order

titlestringrequired

Page title

imagesobject[]

Images discovered in page order

statusinteger

HTTP status code returned by Better Fetch

final_urlstringrequired

Final URL after redirects

html_bytesinteger

Raw HTML byte length

word_countintegerrequired

Words in extracted text

descriptionstring

Meta description

content_typestring

Response Content-Type header when available

canonical_urlstring

Canonical page URL

json_ld_typesstring[]

Structured data @type values

selector_resultsobject[]

Optional selector text matches

Examples

example-page

{
  "url": "https://example.com",
  "max_links": 10,
  "selectors": [
    "h1",
    "p"
  ],
  "max_images": 5,
  "max_text_chars": 2000
}

Use cases

Fast HTML extraction

Collect titles, descriptions, canonical URLs, readable text, links, and images from static pages without paying the browser-rendering cost.

Crawler building blocks

Use the raw HTTP response fields, link inventory, and content-type metadata as a first pass before building a deeper domain-specific crawler.

Agent page inspection

Give an agent clean static-page data for documentation pages, blogs, product pages, and directories that do not require JavaScript rendering.

FAQ

How is Cheerio Scraper different from Web Scraper?

Cheerio Scraper uses raw HTTP and is faster for static HTML pages. Web Scraper uses the browser path and is better for pages that need client-side JavaScript rendering.

Does this version crawl recursively?

No. Version 0.1 focuses on one page per run. Recursive queues, glob filters, pseudo-URLs, and enqueue behavior should be added as separately validated slices.

Can it run arbitrary page functions?

No. It returns a fixed, safe structured extraction. Custom Node page functions are intentionally out of scope for this first marketplace version.

Use it anywhere

MCP (Claude, Cursor, any client)

# Add the Better Fetch MCP connector (or paste the URL into
# Claude → Settings → Connectors → Add custom connector):
claude mcp add --transport http better-fetch https://betterfetch.co/api/mcp \
  --header "Authorization: Bearer bf_your_key_here"

# Then ask for the tool by name: cheerio_scraper

REST

curl -sS -X POST "https://betterfetch.co/api/tools/cheerio_scraper/run" \
  -H "Authorization: Bearer bf_your_key_here" \
  -H "Content-Type: application/json" \
  -d '{"input": {"url":"https://example.com","max_links":10,"selectors":["h1","p"],"max_images":5,"max_text_chars":2000}}'

Run locally

git clone https://github.com/better-fetch/cheerio-scraper && cd cheerio-scraper && npm i
BETTER_FETCH_API_KEY=bf_your_key_here npx bf-tool run --input '{"url":"https://example.com","max_links":10,"selectors":["h1","p"],"max_images":5,"max_text_chars":2000}'