· Paul Crossland
Stop Waiting on Network Idle
Network idle is a weak readiness signal for modern apps. Wait for the data, selector, or API response that proves the page is useful.
A browser render is only valuable after the page has produced the data you came to fetch. For modern apps, networkidle is usually the wrong definition of "ready"; production fetch pipelines should wait for a specific DOM selector, JSON response, or structured-data marker instead.
Network quiet is not page readiness
networkidle sounds objective: wait until the browser stops making requests. In practice, it mixes too many unrelated signals. Analytics beacons, long polling, streaming responses, service-worker updates, prefetches, and ads can all keep a page noisy after the useful content has rendered. Other pages go quiet before client-side data hydration has finished.
The automation libraries themselves hint at this. Puppeteer's Page.waitForNetworkIdle() waits until the network is idle for at least the configured idle time. Playwright's load-state docs emphasize that explicit load-state waiting is often unnecessary because actions and locators should wait for the thing being interacted with.
For scraping and extraction, the same principle applies: wait for evidence, not silence.
Modern frameworks make silence even less reliable
Next.js documents navigation around prefetching, streaming, and client-side transitions. Those features are great for users, but they blur old crawler milestones. A page can render a shell, stream segments later, prefetch nearby routes, or update content after the load event.
MDN's DOMContentLoaded reference is a useful reminder that browser events describe parser and resource milestones, not business readiness. DOMContentLoaded, load, and network quiet answer browser questions. They do not answer whether the price table, job listing, search result, recipe card, or product JSON you need is actually present.
Build a readiness contract per page type
A robust browser-grade fetch should have a small readiness contract. For each target or template, define one or more of these checks:
- A stable selector exists, such as
[data-product-id],.search-result, orscript[type="application/ld+json"]. - A specific XHR or fetch endpoint returned a successful response.
- The extracted JSON has required keys and a minimum item count.
- A negative state appeared, such as "no results", "sign in", or a bot challenge.
- A maximum timeout expired and the run is recorded as incomplete, not silently accepted.
This turns rendering from a guess into a contract you can monitor.
Prefer API evidence when the page exposes it
Before spending more time tuning waits, inspect the network. Many dynamic pages hydrate from JSON endpoints that are cleaner than the rendered HTML. If the page calls an internal search API, product API, or listing endpoint, that response is usually the best readiness signal.
A practical sequence is:
open page -> capture network -> identify data API -> replay API when allowed -> render only when needed
When replaying is not allowed or needs browser cookies, keep the browser session, wait for the relevant response in the page, then extract the DOM or JSON payload. Either way, the readiness signal should be tied to the data source, not to global network quiet.
Log why a render finished
Every production fetch result should say why it stopped waiting. Log fields like these:
{
"ready_by": "selector",
"selector": ".product-card",
"item_count": 24,
"timeout_ms": 30000,
"network_requests_seen": 47
}
Those logs make failures diagnosable. A timeout with zero product cards is a different problem from a timeout with 24 cards and a lingering analytics connection. A selector hit with a bot-wall banner is not a success. A JSON response with an empty array may be a valid zero-result page.
What to change in your fetch pipeline
Use networkidle only as a secondary guardrail, not as the primary success condition. For each recurring target, store a readiness contract beside the extraction logic. Start with one selector or response URL, then add negative-state checks as you observe failures.
The result is faster, cheaper, and more reliable rendering: fewer unnecessary waits, fewer false successes, and clearer debugging when the modern web does something asynchronous after the page looks finished.