Better Fetch

· Paul Crossland

Replay Fetch Metadata, Not Just URLs

Internal APIs often check Sec-Fetch-* context. Capture request intent before replaying browser traffic outside the page.

When you discover a clean JSON endpoint behind a page, do not copy the URL and call it a day. Modern browsers send Fetch Metadata headers that tell servers whether a request came from a navigation, an image, a script, or a same-origin API call. If your replay drops that context, a reliable browser capture can turn into a confusing 403, empty response, or different cache path.

The browser is sending intent

Fetch Metadata is a small family of Sec-Fetch-* request headers. MDN describes them as forbidden request headers, which means application JavaScript cannot set them directly; the browser owns them. The W3C draft defines headers such as Sec-Fetch-Site, Sec-Fetch-Mode, Sec-Fetch-Dest, and Sec-Fetch-User so a server can understand the request context.

That context matters for extraction work. A page load might send Sec-Fetch-Mode: navigate and Sec-Fetch-Dest: document. A JSON call made by the app might send Sec-Fetch-Mode: cors, Sec-Fetch-Dest: empty, and Sec-Fetch-Site: same-origin. Those are different contracts, even when the same browser profile and cookie jar are involved.

Security teams are using these headers

The OWASP CSRF Prevention Cheat Sheet recommends Fetch Metadata as a defense-in-depth signal for rejecting suspicious cross-site requests. The W3C specification frames the same idea as a way for servers to make better isolation decisions before spending work on a response.

For scrapers and data pipelines, that means a failed API replay is not always a proxy problem. It may be a context problem: the request no longer looks like the browser action that created it.

Capture the request shape with the endpoint

API discovery should save more than the URL. For each useful XHR or fetch call, record the request method, headers that describe browser context, response status, redirect chain, and whether the call happened before or after a challenge or login wall.

A practical capture record looks like this:

endpoint: /api/search?q=boots
method: GET
site_context: same-origin
mode: cors
destination: empty
cookies: reused from page session
referer: product search page
status: 200

You may not need to replay every header manually when you keep the request inside a real browser session. But the capture tells you why the call worked, and it gives you a checklist when you decide whether a cheaper HTTP replay is safe.

Diagnose context before rotating infrastructure

When an endpoint changes from 200 in the page to 403 in replay, check the browser contract before changing regions or proxy pools:

  1. Was the API call same-origin in the page but cross-site in replay?
  2. Did the replay lose the original Referer or session cookies?
  3. Did the request mode or destination change from the browser-observed call?
  4. Did the API depend on a token or cookie set by a prior navigation?
  5. Did the response vary by request headers or cache key?

If the answer is yes, rotating infrastructure only adds noise. Recreate the browser flow, keep the session warm, and replay the API from the same context that discovered it.

The practical rule

Treat Fetch Metadata as part of the extraction contract. Discover APIs with a browser, keep the request context attached to the endpoint, and downgrade to raw HTTP only after proving the lighter replay returns the same data.

That habit keeps production pipelines from mislabeling security policy as bot blocking. It also gives engineers a clearer escalation path: fix context first, then investigate rate limits, geo policy, or bot-wall scoring.

Sources: MDN Fetch metadata request header, MDN Sec-Fetch-Site, W3C Fetch Metadata Request Headers, OWASP CSRF Prevention Cheat Sheet.