bioRxiv and medRxiv Preprints Scraper

Fetch bioRxiv or medRxiv preprint metadata by date window, category, cursor, or DOI through the public bioRxiv API, returning normalized DOI, title, author, date, version, license, category, abstract, JATS XML, PDF, and published-article links.

Researchv0.1.1~1 credit/runSource on GitHub

Overview

bioRxiv and medRxiv Preprints Scraper gives agents a structured route into the public bioRxiv API for biology and health-science preprint metadata. Pull a bounded date window from bioRxiv or medRxiv, apply a subject-category filter, continue from a cursor, or fetch one exact preprint DOI. The tool normalizes the API response into compact rows with preprint DOI, server, title, authors, posted date, version, manuscript type, license, subject category, abstract text, corresponding author details, funder labels, published DOI when linked, article URL, PDF URL, and JATS XML URL. It is designed for preprint surveillance, biomedical and life-sciences literature review, science-news monitoring, pharma and biotech competitive intelligence, RAG corpus triage, weekly reading lists, and reproducible research snapshots.

Last validated: Jul 3, 2026

Playground

Input

doistring

Exact preprint DOI for DOI mode, e.g. 10.1101/2024.05.28.596311.

mode"date_range" | "doi"default: "date_range"

Fetch preprints by date window or fetch one exact preprint DOI.

limitintegerdefault: 10

Maximum records to return from this single API response.

cursorintegerdefault: 0

bioRxiv API cursor/start offset for a date window.

server"biorxiv" | "medrxiv"default: "biorxiv"

Preprint server to query.

date_tostring

End date in YYYY-MM-DD format for date_range mode.

categorystring

Optional category filter such as neuroscience, cell_biology, or sports_medicine.

date_fromstring

Start date in YYYY-MM-DD format for date_range mode.

include_abstractbooleandefault: true

Include abstract text when returned by the public API.

Output

doistring

Exact DOI used

modestringrequired

Mode used for this run

countintegerrequired

Number of returned preprints

cursorinteger

Cursor/start offset used

serverstringrequired

Server queried

statusstring

API status message

date_tostring

End date used

date_fromstring

Start date used

preprintsobject[]required

Normalized bioRxiv or medRxiv preprint records

source_urlstringrequired

bioRxiv API URL fetched

total_matchesinteger

Total API matches reported for the interval

new_papers_countinteger

New paper count reported for the interval

Examples

biorxiv-neuroscience-window

{
  "mode": "date_range",
  "limit": 2,
  "server": "biorxiv",
  "date_to": "2024-06-03",
  "date_from": "2024-06-01",
  "include_abstract": false
}

biorxiv-doi

{
  "doi": "10.1101/2024.05.28.596311",
  "mode": "doi",
  "server": "biorxiv",
  "include_abstract": false
}

Use cases

Preprint surveillance

Track new bioRxiv or medRxiv records in a date window and return normalized DOI, title, author, category, abstract, and PDF/JATS links.

Biomedical and life-science monitoring

Filter by server and category, then pipe fresh preprint metadata into review queues, alerts, dashboards, or research-agent workflows.

RAG corpus triage

Collect abstracts, DOIs, source URLs, PDF URLs, licenses, and JATS XML links before deciding which preprints should be downloaded or embedded.

FAQ

Does bioRxiv and medRxiv Preprints Scraper require an API key?

No. Version 0.1 uses the public bioRxiv API and keeps each run to one bounded request. Upstream rate limits can still apply to aggressive use.

Does it download PDFs or parse full text?

No. It returns metadata plus preprint, PDF, and JATS XML URLs when the public API provides enough information. It does not download PDFs, parse JATS XML, or crawl article pages.

Is this tool affiliated with bioRxiv, medRxiv, or Cold Spring Harbor Laboratory?

No. This Better Fetch tool uses public API data and is not endorsed by or affiliated with bioRxiv, medRxiv, openRxiv, or Cold Spring Harbor Laboratory.

Use it anywhere

MCP (Claude, Cursor, any client)

# Add the Better Fetch MCP connector (or paste the URL into
# Claude → Settings → Connectors → Add custom connector):
claude mcp add --transport http better-fetch https://betterfetch.co/api/mcp \
  --header "Authorization: Bearer bf_your_key_here"

# Then ask for the tool by name: biorxiv_medrxiv_preprints_scraper

REST

curl -sS -X POST "https://betterfetch.co/api/tools/biorxiv_medrxiv_preprints_scraper/run" \
  -H "Authorization: Bearer bf_your_key_here" \
  -H "Content-Type: application/json" \
  -d '{"input": {"mode":"date_range","limit":2,"server":"biorxiv","date_to":"2024-06-03","date_from":"2024-06-01","include_abstract":false}}'

Run locally

git clone https://github.com/better-fetch/tools/tree/main/tools/biorxiv-medrxiv-preprints-scraper && cd biorxiv-medrxiv-preprints-scraper && npm i
BETTER_FETCH_API_KEY=bf_your_key_here npx bf-tool run --input '{"mode":"date_range","limit":2,"server":"biorxiv","date_to":"2024-06-03","date_from":"2024-06-01","include_abstract":false}'