Wikipedia Article Scraper
Search Wikipedia or fetch a specific article through the official MediaWiki API, returning titles, page IDs, canonical URLs, summaries or full extracts, categories, links, images, revision metadata, and Wikidata IDs.
Overview
Wikipedia Article Scraper gives agents a polite, official-API route into Wikipedia content. Search by keyword or fetch an exact article title or URL, then receive compact rows with page title, page ID, canonical URL, language, summary or full plain-text extract, categories, internal links, thumbnail and original image URLs, revision metadata, page length, and Wikidata ID when available. It is designed for knowledge-base enrichment, RAG ingestion, entity research, article monitoring, SEO research, educational tools, and AI workflows that need citation-friendly Wikipedia context without scraping raw HTML.
Last validated: Jul 3, 2026
Playground
Input
mode"page" | "search"default: "page"Fetch one exact article page or search Wikipedia for matching pages.
querystringSearch query for search mode.
extract"intro" | "full"default: "intro"Return the introductory extract or a bounded full article extract.
languagestringdefault: "en"Wikipedia language subdomain, e.g. en, de, fr, es, pt, or simple.
page_urlstringWikipedia article URL for page mode, e.g. https://en.wikipedia.org/wiki/Web_scraping.
page_titlestringExact Wikipedia page title for page mode, e.g. Web scraping.
max_resultsintegerdefault: 5Maximum search results to return in search mode.
include_linksbooleandefault: falseInclude a bounded comma-separated list of internal page links.
max_extract_charsintegerdefault: 4000Maximum characters of plain-text extract to return per page.
include_categoriesbooleandefault: trueInclude visible Wikipedia category names when available.
Output
modestringrequiredMode used for this run
countintegerrequiredNumber of returned article records
querystringSearch query or page title used
articlesobject[]requiredStructured Wikipedia article records
languagestringrequiredWikipedia language edition used
source_urlstringrequiredMediaWiki API URL fetched
Examples
web-scraping-page
{
"mode": "page",
"extract": "intro",
"language": "en",
"page_title": "Web scraping",
"include_links": true,
"include_categories": true
}ai-search
{
"mode": "search",
"query": "artificial intelligence",
"language": "en",
"max_results": 3
}Use cases
FAQ
Does Wikipedia Article Scraper scrape Wikipedia HTML pages?
No. Version 0.1 uses the official MediaWiki Action API through Better Fetch and sends an explicit tool user agent. It returns normalized JSON data instead of parsing rendered HTML.
Can it fetch full article text?
Yes, set extract to full. To keep tool responses useful for agents and MCP clients, max_extract_chars bounds the returned plain-text extract.
Does it support non-English Wikipedia editions?
Yes. Set language to a Wikipedia language subdomain such as en, de, fr, es, pt, or simple. Version 0.1 validates the language subdomain and calls that edition's MediaWiki API.
Use it anywhere
MCP (Claude, Cursor, any client)
# Add the Better Fetch MCP connector (or paste the URL into # Claude → Settings → Connectors → Add custom connector): claude mcp add --transport http better-fetch https://betterfetch.co/api/mcp \ --header "Authorization: Bearer bf_your_key_here" # Then ask for the tool by name: wikipedia_article_scraper
REST
curl -sS -X POST "https://betterfetch.co/api/tools/wikipedia_article_scraper/run" \
-H "Authorization: Bearer bf_your_key_here" \
-H "Content-Type: application/json" \
-d '{"input": {"mode":"page","extract":"intro","language":"en","page_title":"Web scraping","include_links":true,"include_categories":true}}'Run locally
git clone https://github.com/better-fetch/tools/tree/main/tools/wikipedia-article-scraper && cd wikipedia-article-scraper && npm i
BETTER_FETCH_API_KEY=bf_your_key_here npx bf-tool run --input '{"mode":"page","extract":"intro","language":"en","page_title":"Web scraping","include_links":true,"include_categories":true}'