YouTube Transcript
Turn a public YouTube video URL or ID into its full transcript: timestamped caption segments plus a single joined text block, with the language and whether the track is auto-generated.
Overview
YouTube Transcript turns any public YouTube video into clean, structured text. It reads the video's caption tracks through the Better Fetch stealth engine, selects the best track for your language, and returns both timestamped segments and one joined transcript string — the ideal input for summarization, search indexing, retrieval-augmented generation, or content repurposing without downloading media.
Last validated: Jul 3, 2026
Playground
Input
urlstringPublic YouTube video URL (watch, youtu.be, shorts, or embed). Use url or video_id.
langstringPreferred caption language code, e.g. en or es. Falls back sensibly when unavailable.
video_idstring11-character YouTube video id, as an alternative to url.
max_segmentsintegerdefault: 3000Maximum caption segments to return. The joined text field always covers the returned segments.
Output
urlstringrequiredCanonical watch URL used
langstringLanguage code of the chosen caption track
textstringrequiredAll segments joined into one transcript string
titlestringVideo title
authorstringChannel name
segmentsobject[]requiredTimestamped caption cues
video_idstringrequiredYouTube video id
is_generatedbooleanTrue if the track is auto-generated (ASR)
segment_countintegerrequiredNumber of caption segments returned
duration_secondsnumberVideo length in seconds
Examples
transcript-by-url
{
"url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ"
}Use cases
FAQ
Does it work on any YouTube video?
It works on public videos that have captions — either human-written or YouTube's auto-generated track. Videos with captions disabled, or private and age-restricted videos, cannot be transcribed.
Can I choose the transcript language?
Yes. Pass a language code such as en or es and the tool prefers a matching track, falling back to a manual English track, then any manual track, then auto-generated captions.
What is the difference between the segments and the text field?
segments are timestamped caption cues with start and duration in seconds, useful for citation and search. text is every segment joined into one string, convenient for summarization prompts.
Use it anywhere
MCP (Claude, Cursor, any client)
# Add the Better Fetch MCP connector (or paste the URL into # Claude → Settings → Connectors → Add custom connector): claude mcp add --transport http better-fetch https://betterfetch.co/api/mcp \ --header "Authorization: Bearer bf_your_key_here" # Then ask for the tool by name: youtube_transcript
REST
curl -sS -X POST "https://betterfetch.co/api/tools/youtube_transcript/run" \
-H "Authorization: Bearer bf_your_key_here" \
-H "Content-Type: application/json" \
-d '{"input": {"url":"https://www.youtube.com/watch?v=dQw4w9WgXcQ"}}'Run locally
git clone https://github.com/better-fetch/tools/tree/main/tools/youtube-transcript && cd youtube-transcript && npm i
BETTER_FETCH_API_KEY=bf_your_key_here npx bf-tool run --input '{"url":"https://www.youtube.com/watch?v=dQw4w9WgXcQ"}'