Better Fetch

· Paul Crossland

Browser Extensions Are Fetch State

Puppeteer's new extension support is a reminder to log, test, and isolate extension-influenced browser fetches.

A browser-backed fetch is not just a request plus a renderer. It is a full client environment: network stack, cookie jar, storage partition, JavaScript runtime, permissions, installed features, and sometimes extensions. That last part is easy to ignore in production crawling because most automation workers run a clean browser profile. But extensions are creeping into more legitimate automation workflows: internal data collection, authenticated business tooling, consent handling, debugging, accessibility checks, and agent browsing systems that need the same client-side capabilities a human operator uses.

This week made that operationally relevant. On July 1, the Puppeteer project released puppeteer-core v25.3.0, adding support for installing extensions for browser contexts. The same release also fixed duplicate Set-Cookie normalization and included browser roll updates. One day earlier, the Chrome team published a Stable Channel Update for Desktop, moving Chrome stable builds again across Windows, macOS, and Linux. Puppeteer also shipped v25.2.1 on June 24, including a fix for a regression when using Puppeteer with untrusted sessions.

The synthesis is straightforward: browser automation surfaces are becoming more realistic and more configurable at the same time browsers continue to move quickly. For Better Fetch-style infrastructure, that means extension state belongs in the fetch contract. If an extension can observe, alter, block, annotate, or trigger network behavior in a real browser session, then a crawler that uses extensions must treat them as first-class state rather than incidental setup.

Why extensions matter to fetching

Extensions can affect a page without looking like application code. Depending on permissions and implementation, they may inject content scripts, alter request headers, block or redirect requests, observe cookies, use extension storage, open side panels, add service workers, or call background APIs. Even a benign internal extension can change timing, DOM readiness, resource loading, or the shape of pages that extraction code sees.

That matters because fetch systems usually explain outcomes with a smaller vocabulary:

  • Target site changed.
  • Proxy or region changed.
  • Session expired.
  • Browser version changed.
  • Extractor broke.
  • Anti-bot or access-control policy changed.

Those categories are useful, but incomplete. If one worker pool has an extension installed and another does not, both can be wrong in different ways. The extension-enabled pool may be closer to a human operator's browser for a specific authenticated workflow. The clean pool may be closer to a neutral measurement environment. Neither is universally more correct. They are different client states.

For operators, the risk is not only that extensions break pages. The bigger risk is silent attribution error. A price field disappears, a consent prompt is skipped, a JSON endpoint is never called, or a login wall appears only on one shard. If extension state is not logged, the incident gets misclassified as site volatility, browser drift, or proxy quality.

Do not make extension state implicit

The new Puppeteer capability is useful because it makes extension installation more programmable. That also makes accidental drift easier. A local debugging helper can become a production dependency. A test extension can leak into a shared browser image. A security or compliance extension can update independently from the crawler release. A browser context can be created with a different extension set than the one assumed by the extraction code.

A production fetch record should therefore include extension metadata beside browser and automation metadata. At minimum, log:

  • Automation library and version, such as puppeteer-core 25.3.0.
  • Browser product, full version, channel, and executable path.
  • Whether the browser context was extension-enabled.
  • Extension identifiers, names, and versions.
  • Extension source digest or package digest where internally controlled.
  • Permissions requested by each extension.
  • Whether extensions were loaded from an immutable build artifact or a mutable path.
  • Context type: fresh, persisted profile, authenticated profile, or replay fixture.
  • Session class, region, locale, timezone, and consent state.

Do not store secrets or cookie values in these logs. Store identifiers, hashes, and redacted metadata. The goal is to make outcome differences explainable without creating a new sensitive-data sink.

Build two baselines: clean and instrumented

For most crawling, a clean browser context should remain the default. It is easier to reason about, easier to reproduce, and less likely to introduce accidental behavior. Extension-enabled contexts should be deliberate and named.

A practical deployment model is to run two baselines:

  1. A clean baseline with no extensions, used for neutral measurement and broad crawling.
  2. An instrumented baseline with a known extension set, used only where the workflow requires it.

Compare the two regularly. The comparison should not ask only whether the page loaded. It should diff the fetch evidence:

  • Main document status and final URL.
  • Redirect count and redirect destinations.
  • Request and response header presence, with sensitive values redacted.
  • Duplicate or unusual Set-Cookie handling.
  • Cookie names added, removed, or changed.
  • Local storage and session storage keys touched.
  • Number and hosts of XHR, fetch, WebSocket, and event-source requests.
  • DOM readiness timing and extraction completion timing.
  • Critical selector presence and extracted field completeness.
  • Screenshot hash or visual diff for pages where layout affects extraction.

The point is not to ban extensions. The point is to know when the extension is part of the data-producing system. If an extension improves an authenticated operational workflow, keep it. But promote it with the same discipline as a browser upgrade or extractor change.

Treat cookie and header behavior as high-risk

The July 1 Puppeteer release also mentions duplicate Set-Cookie normalization. That is a good reminder that cookie handling is not a boring detail in browser-grade fetching. Cookies determine whether a session sees a region, language, consent state, logged-in view, bot-management challenge, or cached experiment bucket. Extensions can sit near that same boundary.

When extension-enabled fetches are in scope, add targeted tests for cookie and header behavior:

  • Does the extension read, write, or react to cookies?
  • Does it change request headers or add diagnostic headers?
  • Does it block third-party resources that set state later used by the page?
  • Does it alter navigation timing by delaying requests?
  • Does it affect service-worker registration or cache behavior?
  • Does it change results between fresh and warmed sessions?

Keep these tests policy-safe. They should verify your own client behavior and target compatibility. They should not attempt to bypass access controls, defeat bot defenses, or disguise automated traffic. If a site blocks or restricts access, the correct operational response is to respect that policy or use authorized access paths.

Add extension fixtures to the canary suite

Browser automation teams already need canaries for browser version drift. Extension support adds another dimension. A useful fixture set includes:

  • A page with no extension interaction, to prove the clean and extension contexts match.
  • A page where the extension intentionally injects a visible marker into a controlled test domain.
  • A page with multiple Set-Cookie headers and redirects.
  • A page that uses local storage, session storage, and IndexedDB.
  • A page with service-worker registration.
  • A page that loads cross-origin resources and APIs.
  • A page that exercises the extractor's critical selectors.

Run the same fixture through clean and extension-enabled contexts on every automation-library upgrade, browser rollout, container image change, and extension release. Store the artifacts: network trace, final HTML hash, screenshot hash, storage delta, cookie-name delta, and extraction result. If the fixture changes only in the extension lane, you have a contained extension incident rather than a vague crawler regression.

Operational checklist

Before using extensions in production fetches, answer these questions:

  • Which workflows require extensions, and which must remain clean?
  • Is the extension package pinned by version and digest?
  • Are extension permissions reviewed and minimized?
  • Is extension state logged in every fetch trace?
  • Can workers be grouped by extension set in dashboards and alerts?
  • Can a failed fetch be replayed with the exact same browser, automation library, extension set, region, locale, and session class?
  • Do canaries compare clean and extension-enabled contexts before rollout?
  • Is there a rollback path that removes or downgrades the extension without changing unrelated crawler code?
  • Are sensitive values redacted from extension, cookie, and request logs?
  • Is the workflow authorized by the target site or business relationship?

The last question matters. Better browser automation should make data systems more reliable and auditable, not more evasive. Extension-enabled contexts are appropriate when they represent an authorized browser environment or internal diagnostic tool. They are not a license to defeat access controls.

Puppeteer's new browser-context extension support is a small release-note item with a large operational lesson. The more closely automation resembles real browsers, the more state it can carry. Production fetch systems need to name that state, pin it, log it, test it, and separate it from clean baselines. Otherwise, the next extraction failure may not be a site change at all. It may be an invisible extension deployment hiding inside the browser.