arXiv Paper Scraper

Search arXiv or fetch a specific arXiv ID through the official arXiv Atom API, returning normalized preprint metadata with titles, abstracts, authors, categories, dates, DOI values, journal references, PDF URLs, and arXiv URLs.

Researchv0.1.2~1 credit/runSource on GitHub

Overview

arXiv Paper Scraper gives agents a polite, official-API path into open preprint metadata. Search by arXiv query syntax, category, title, author, or abstract text, or fetch an exact arXiv ID such as 2303.08774. The tool normalizes arXiv Atom XML into compact JSON rows with paper IDs, version labels, titles, abstracts, authors, affiliations, primary categories, all categories, publication and update timestamps, DOI values, journal references, comments, abstract URLs, and PDF URLs. It is designed for literature monitoring, AI research scouting, RAG corpus selection, academic trend tracking, reading-list generation, and research-agent workflows that need source URLs and structured paper metadata without browser scraping.

Last validated: Jul 3, 2026

Playground

mode

Search arXiv or fetch one exact arXiv ID.

query

Raw arXiv search query using arXiv syntax, e.g. all:large language models or ti:transformer AND au:vaswani.

start

Zero-based arXiv result offset for search mode.

title

Title text to search with the arXiv ti: field.

author

Author name to search with the arXiv au: field.

sort_by

arXiv sort field for search mode.

abstract

Abstract text to search with the arXiv abs: field.

arxiv_id

Exact arXiv ID or arXiv URL, e.g. 2303.08774, 2303.08774v6, or https://arxiv.org/abs/2303.08774.

Input

mode"search" | "id"default: "search"

Search arXiv or fetch one exact arXiv ID.

querystring

Raw arXiv search query using arXiv syntax, e.g. all:large language models or ti:transformer AND au:vaswani.

startintegerdefault: 0

Zero-based arXiv result offset for search mode.

titlestring

Title text to search with the arXiv ti: field.

authorstring

Author name to search with the arXiv au: field.

sort_by"relevance" | "lastUpdatedDate" | "submittedDate"default: "relevance"

arXiv sort field for search mode.

abstractstring

Abstract text to search with the arXiv abs: field.

arxiv_idstring

Exact arXiv ID or arXiv URL, e.g. 2303.08774, 2303.08774v6, or https://arxiv.org/abs/2303.08774.

categorystring

Optional arXiv category such as cs.CL, cs.LG, stat.ML, quant-ph, or hep-th.

sort_order"ascending" | "descending"default: "descending"

arXiv sort order for search mode.

max_resultsintegerdefault: 10

Maximum papers to return in search mode.

include_abstractbooleandefault: true

Include abstract/summary text when available.

Output

modestringrequired

Mode used for this run

countintegerrequired

Number of returned paper records

querystring

Search query used

papersobject[]required

Structured arXiv paper records

arxiv_idstring

Exact arXiv ID used

source_urlstringrequired

arXiv Atom API URL fetched

start_indexinteger

arXiv result start index

total_matchesinteger

Total arXiv matches reported by OpenSearch metadata

items_per_pageinteger

arXiv result page size

acknowledgementstringrequired

arXiv acknowledgement statement

Examples

gpt4-paper

{
  "mode": "id",
  "arxiv_id": "2303.08774"
}

recent-llm-search

{
  "mode": "search",
  "query": "all:\"large language models\"",
  "sort_by": "submittedDate",
  "category": "cs.CL",
  "sort_order": "descending",
  "max_results": 3
}

Use cases

AI research monitoring

Track recent papers by category or query, such as cs.CL, cs.LG, retrieval augmented generation, long context transformers, or diffusion models.

Reading-list generation

Fetch exact arXiv IDs or bounded search results with titles, abstracts, authors, PDF links, dates, and categories for human or agent review.

RAG corpus triage

Collect normalized preprint metadata and abstracts before deciding which papers should be downloaded, embedded, cited, or monitored.

FAQ

Does arXiv Paper Scraper require an API key?

No. Version 0.1 uses the official public arXiv Atom API through Better Fetch. It keeps each run to a single bounded query (retrying only when arXiv returns a transient empty feed) and does not require login cookies or private credentials.

Can it fetch PDFs or full paper text?

No. It returns metadata, abstracts, arXiv abstract URLs, and PDF URLs. It does not download PDFs, parse TeX sources, or extract full paper bodies.

Is this tool affiliated with arXiv?

No. Thank you to arXiv for use of its open access interoperability. This tool was not reviewed or approved by, nor does it necessarily express or reflect the policies or opinions of, arXiv.

Use it anywhere

MCP (Claude, Cursor, any client)

# Add the Better Fetch MCP connector (or paste the URL into
# Claude → Settings → Connectors → Add custom connector):
claude mcp add --transport http better-fetch https://betterfetch.co/api/mcp \
  --header "Authorization: Bearer bf_your_key_here"

# Then ask for the tool by name: arxiv_paper_scraper

REST

curl -sS -X POST "https://betterfetch.co/api/tools/arxiv_paper_scraper/run" \
  -H "Authorization: Bearer bf_your_key_here" \
  -H "Content-Type: application/json" \
  -d '{"input": {"mode":"id","arxiv_id":"2303.08774"}}'

Run locally

git clone https://github.com/better-fetch/tools/tree/main/tools/arxiv-paper-scraper && cd arxiv-paper-scraper && npm i
BETTER_FETCH_API_KEY=bf_your_key_here npx bf-tool run --input '{"mode":"id","arxiv_id":"2303.08774"}'