Wikipedia Article Scraper

Search Wikipedia or fetch a specific article through the official MediaWiki API, returning titles, page IDs, canonical URLs, summaries or full extracts, categories, links, images, revision metadata, and Wikidata IDs.

Content extractionv0.1.1~1 credit/runSource on GitHub

Overview

Wikipedia Article Scraper gives agents a polite, official-API route into Wikipedia content. Search by keyword or fetch an exact article title or URL, then receive compact rows with page title, page ID, canonical URL, language, summary or full plain-text extract, categories, internal links, thumbnail and original image URLs, revision metadata, page length, and Wikidata ID when available. It is designed for knowledge-base enrichment, RAG ingestion, entity research, article monitoring, SEO research, educational tools, and AI workflows that need citation-friendly Wikipedia context without scraping raw HTML.

Last validated: Jul 3, 2026

Playground

Input

mode"page" | "search"default: "page"

Fetch one exact article page or search Wikipedia for matching pages.

querystring

Search query for search mode.

extract"intro" | "full"default: "intro"

Return the introductory extract or a bounded full article extract.

languagestringdefault: "en"

Wikipedia language subdomain, e.g. en, de, fr, es, pt, or simple.

page_urlstring

Wikipedia article URL for page mode, e.g. https://en.wikipedia.org/wiki/Web_scraping.

page_titlestring

Exact Wikipedia page title for page mode, e.g. Web scraping.

max_resultsintegerdefault: 5

Maximum search results to return in search mode.

include_linksbooleandefault: false

Include a bounded comma-separated list of internal page links.

max_extract_charsintegerdefault: 4000

Maximum characters of plain-text extract to return per page.

include_categoriesbooleandefault: true

Include visible Wikipedia category names when available.

Output

modestringrequired

Mode used for this run

countintegerrequired

Number of returned article records

querystring

Search query or page title used

articlesobject[]required

Structured Wikipedia article records

languagestringrequired

Wikipedia language edition used

source_urlstringrequired

MediaWiki API URL fetched

Examples

web-scraping-page

{
  "mode": "page",
  "extract": "intro",
  "language": "en",
  "page_title": "Web scraping",
  "include_links": true,
  "include_categories": true
}

ai-search

{
  "mode": "search",
  "query": "artificial intelligence",
  "language": "en",
  "max_results": 3
}

Use cases

Knowledge-base enrichment

Fetch canonical Wikipedia titles, summaries, page URLs, images, categories, and Wikidata IDs to enrich company, person, place, product, or concept records.

RAG context collection

Collect bounded plain-text article extracts and source URLs for retrieval pipelines, research assistants, documentation bots, and educational AI agents.

Topic discovery

Search Wikipedia by keyword and return top matching pages with concise extracts, categories, thumbnails, and links before deciding which pages to crawl more deeply.

FAQ

Does Wikipedia Article Scraper scrape Wikipedia HTML pages?

No. Version 0.1 uses the official MediaWiki Action API through Better Fetch and sends an explicit tool user agent. It returns normalized JSON data instead of parsing rendered HTML.

Can it fetch full article text?

Yes, set extract to full. To keep tool responses useful for agents and MCP clients, max_extract_chars bounds the returned plain-text extract.

Does it support non-English Wikipedia editions?

Yes. Set language to a Wikipedia language subdomain such as en, de, fr, es, pt, or simple. Version 0.1 validates the language subdomain and calls that edition's MediaWiki API.

Use it anywhere

MCP (Claude, Cursor, any client)

# Add the Better Fetch MCP connector (or paste the URL into
# Claude → Settings → Connectors → Add custom connector):
claude mcp add --transport http better-fetch https://betterfetch.co/api/mcp \
  --header "Authorization: Bearer bf_your_key_here"

# Then ask for the tool by name: wikipedia_article_scraper

REST

curl -sS -X POST "https://betterfetch.co/api/tools/wikipedia_article_scraper/run" \
  -H "Authorization: Bearer bf_your_key_here" \
  -H "Content-Type: application/json" \
  -d '{"input": {"mode":"page","extract":"intro","language":"en","page_title":"Web scraping","include_links":true,"include_categories":true}}'

Run locally

git clone https://github.com/better-fetch/tools/tree/main/tools/wikipedia-article-scraper && cd wikipedia-article-scraper && npm i
BETTER_FETCH_API_KEY=bf_your_key_here npx bf-tool run --input '{"mode":"page","extract":"intro","language":"en","page_title":"Web scraping","include_links":true,"include_categories":true}'