arXiv Paper Scraper
Search arXiv or fetch a specific arXiv ID through the official arXiv Atom API, returning normalized preprint metadata with titles, abstracts, authors, categories, dates, DOI values, journal references, PDF URLs, and arXiv URLs.
Overview
arXiv Paper Scraper gives agents a polite, official-API path into open preprint metadata. Search by arXiv query syntax, category, title, author, or abstract text, or fetch an exact arXiv ID such as 2303.08774. The tool normalizes arXiv Atom XML into compact JSON rows with paper IDs, version labels, titles, abstracts, authors, affiliations, primary categories, all categories, publication and update timestamps, DOI values, journal references, comments, abstract URLs, and PDF URLs. It is designed for literature monitoring, AI research scouting, RAG corpus selection, academic trend tracking, reading-list generation, and research-agent workflows that need source URLs and structured paper metadata without browser scraping.
Last validated: Jul 3, 2026
Playground
Input
mode"search" | "id"default: "search"Search arXiv or fetch one exact arXiv ID.
querystringRaw arXiv search query using arXiv syntax, e.g. all:large language models or ti:transformer AND au:vaswani.
startintegerdefault: 0Zero-based arXiv result offset for search mode.
titlestringTitle text to search with the arXiv ti: field.
authorstringAuthor name to search with the arXiv au: field.
sort_by"relevance" | "lastUpdatedDate" | "submittedDate"default: "relevance"arXiv sort field for search mode.
abstractstringAbstract text to search with the arXiv abs: field.
arxiv_idstringExact arXiv ID or arXiv URL, e.g. 2303.08774, 2303.08774v6, or https://arxiv.org/abs/2303.08774.
categorystringOptional arXiv category such as cs.CL, cs.LG, stat.ML, quant-ph, or hep-th.
sort_order"ascending" | "descending"default: "descending"arXiv sort order for search mode.
max_resultsintegerdefault: 10Maximum papers to return in search mode.
include_abstractbooleandefault: trueInclude abstract/summary text when available.
Output
modestringrequiredMode used for this run
countintegerrequiredNumber of returned paper records
querystringSearch query used
papersobject[]requiredStructured arXiv paper records
arxiv_idstringExact arXiv ID used
source_urlstringrequiredarXiv Atom API URL fetched
start_indexintegerarXiv result start index
total_matchesintegerTotal arXiv matches reported by OpenSearch metadata
items_per_pageintegerarXiv result page size
acknowledgementstringrequiredarXiv acknowledgement statement
Examples
gpt4-paper
{
"mode": "id",
"arxiv_id": "2303.08774"
}recent-llm-search
{
"mode": "search",
"query": "all:\"large language models\"",
"sort_by": "submittedDate",
"category": "cs.CL",
"sort_order": "descending",
"max_results": 3
}Use cases
FAQ
Does arXiv Paper Scraper require an API key?
No. Version 0.1 uses the official public arXiv Atom API through Better Fetch. It keeps each run to a single bounded query (retrying only when arXiv returns a transient empty feed) and does not require login cookies or private credentials.
Can it fetch PDFs or full paper text?
No. It returns metadata, abstracts, arXiv abstract URLs, and PDF URLs. It does not download PDFs, parse TeX sources, or extract full paper bodies.
Is this tool affiliated with arXiv?
No. Thank you to arXiv for use of its open access interoperability. This tool was not reviewed or approved by, nor does it necessarily express or reflect the policies or opinions of, arXiv.
Use it anywhere
MCP (Claude, Cursor, any client)
# Add the Better Fetch MCP connector (or paste the URL into # Claude → Settings → Connectors → Add custom connector): claude mcp add --transport http better-fetch https://betterfetch.co/api/mcp \ --header "Authorization: Bearer bf_your_key_here" # Then ask for the tool by name: arxiv_paper_scraper
REST
curl -sS -X POST "https://betterfetch.co/api/tools/arxiv_paper_scraper/run" \
-H "Authorization: Bearer bf_your_key_here" \
-H "Content-Type: application/json" \
-d '{"input": {"mode":"id","arxiv_id":"2303.08774"}}'Run locally
git clone https://github.com/better-fetch/tools/tree/main/tools/arxiv-paper-scraper && cd arxiv-paper-scraper && npm i
BETTER_FETCH_API_KEY=bf_your_key_here npx bf-tool run --input '{"mode":"id","arxiv_id":"2303.08774"}'