Extract Article

Turn any article URL into clean, structured text: title, byline, publish date, and full body text — ready for LLM context windows. Works on paywalled-lite and bot-protected pages via the stealth engine.

Content extractionv0.1.0~1 credit/runSource on GitHub

Overview

Extract Article turns messy web pages into clean text that agents can summarize, cite, classify, or store. It fetches the article through Better Fetch, prefers article and main content regions, preserves useful metadata, and returns a compact payload designed for LLM context windows.

Last validated: Jul 3, 2026

Playground

Input

urlstring (uri)required

The article URL to extract

max_charsintegerdefault: 100000

Truncate the extracted text to this many characters

Output

textstringrequired

Clean article body text

titlestringrequired

Article title (og:title preferred)

bylinestring

Author, when detectable

final_urlstringrequired

URL after redirects

publishedstring

Publish date (ISO where available)

site_namestring

Publication name

word_countinteger

Words in the extracted text

Examples

wikipedia-web-scraping

{
  "url": "https://en.wikipedia.org/wiki/Web_scraping",
  "max_chars": 5000
}

Use cases

Research ingestion

Convert article URLs into normalized text and metadata before saving them to a knowledge base, vector store, spreadsheet, or editorial workflow.

Agent context prep

Let an AI agent fetch a source article and receive clean title, byline, publish date, final URL, word count, and body text in one structured response.

FAQ

Does Extract Article return the rendered page HTML?

No. The tool uses rendered and response HTML internally, then returns clean article text and metadata so downstream agents do not need to parse markup.

Can I limit the amount of returned article text?

Yes. Pass max_chars to cap the extracted text length, which helps keep long articles inside the context window budget for downstream models.

Use it anywhere

MCP (Claude, Cursor, any client)

# Add the Better Fetch MCP connector (or paste the URL into
# Claude → Settings → Connectors → Add custom connector):
claude mcp add --transport http better-fetch https://betterfetch.co/api/mcp \
  --header "Authorization: Bearer bf_your_key_here"

# Then ask for the tool by name: extract_article

REST

curl -sS -X POST "https://betterfetch.co/api/tools/extract_article/run" \
  -H "Authorization: Bearer bf_your_key_here" \
  -H "Content-Type: application/json" \
  -d '{"input": {"url":"https://en.wikipedia.org/wiki/Web_scraping","max_chars":5000}}'

Run locally

git clone https://github.com/better-fetch/extract-article && cd extract-article && npm i
BETTER_FETCH_API_KEY=bf_your_key_here npx bf-tool run --input '{"url":"https://en.wikipedia.org/wiki/Web_scraping","max_chars":5000}'