How to Slash Scraping Bandwidth by 90% in 2026 (Save on Proxies!)
Here is the line item that quietly wrecks scraping budgets: residential proxy bandwidth. It is billed per gigabyte, often several dollars a gig, and a headless browser happily downloads every image, font, and tracker on a page — none of which your parser ever reads.
The fix is one of the highest-leverage things you can do as a scraper, and almost no beginner tutorial mentions it. By aborting heavy resources before they download, you routinely take a multi-megabyte page down to tens of kilobytes.
This guide shows you exactly where the bytes go, then how to block them in both Puppeteer and Playwright, how to measure the savings, and the one gotcha that can quietly break your parser.
TL;DR: On a typical page, images are about 91% of the weight. Block image, font, media, and stylesheet resource types (plus tracking hosts) with setRequestInterception in Puppeteer or page.route() in Playwright, and add --blink-settings=imagesEnabled=false as a backstop. Published tests show pages dropping from ~1.9 MB to under 10 KB. Verify your selectors still parse after blocking CSS. On a per-GB proxy, this is the difference between dollars and pennies per thousand pages.
Where a page's bytes actually go
You cannot optimize what you have not measured, so start with the breakdown. Published browser-scraping tests put a typical page at roughly 3.3 MB across about 49 requests, and the split is lopsided.
| Resource type | Share of page weight | Does your parser need it? |
|---|---|---|
| Images | ~91% | Almost never |
| JavaScript | ~5% | Sometimes — needed to render content |
| CSS | ~2% | Rarely |
| HTML document | ~1% | Yes — this is your data |
| Fonts, media, other | ~1% | No |
Read that top row again: nine out of ten bytes you pay for are images you throw away. The HTML you actually parse is about one percent of the download. Blocking the dead weight is not a micro-optimization — it is the whole game.
Blocking resources in Puppeteer
Puppeteer's tool for this is request interception. You turn it on, then decide per request whether to continue() or abort(). Aborting by resource type is the core move:
const puppeteer = require('puppeteer-extra');
const Stealth = require('puppeteer-extra-plugin-stealth');
puppeteer.use(Stealth());
const BLOCK_TYPES = new Set(['image', 'font', 'media', 'stylesheet']);
async function leanPage(browser) {
const page = await browser.newPage();
await page.setRequestInterception(true);
page.on('request', (req) => {
if (BLOCK_TYPES.has(req.resourceType())) {
return req.abort();
}
req.continue();
});
return page;
}
Notice what is not in the block set: document, script, and xhr/fetch. Those carry the HTML and the JavaScript that builds the results, so keep them. For a search engine that renders results client-side — like the headless approach in scraping Google for free — you need scripts to run; you just do not need the imagery they decorate the page with.
Also block tracking and analytics hosts
Resource types catch most of the weight, but a second category sneaks bytes through: analytics, ads, and tag-manager scripts. They load as script (which you are allowing) yet contribute nothing to your data. Block them by host:
const BLOCK_HOSTS = [
'google-analytics.com', 'googletagmanager.com',
'doubleclick.net', 'facebook.net', 'hotjar.com',
'segment.io', 'amplitude.com',
];
page.on('request', (req) => {
const url = req.url();
const blockedType = BLOCK_TYPES.has(req.resourceType());
const blockedHost = BLOCK_HOSTS.some((h) => url.includes(h));
if (blockedType || blockedHost) {
return req.abort();
}
req.continue();
});
This combined filter — types plus hosts — is what a production scraper actually ships. It is also kinder to the target site, since you are not pulling its ad and analytics payloads on every request.
Blocking resources in Playwright
Playwright does the same thing through page.route(), which intercepts requests matching a glob. The logic is identical; only the API differs:
const { chromium } = require('playwright');
const BLOCK_TYPES = ['image', 'font', 'media', 'stylesheet'];
(async () => {
const browser = await chromium.launch();
const page = await browser.newPage();
await page.route('**/*', (route) => {
const type = route.request().resourceType();
if (BLOCK_TYPES.includes(type)) {
return route.abort();
}
route.continue();
});
await page.goto('https://example.com/search?q=test', {
waitUntil: 'domcontentloaded',
});
// ... parse the DOM ...
await browser.close();
})();
In Python the call is the same shape — page.route("**/*", handler) with route.abort() and route.continue_(). Whichever stack you use, the network APIs are documented in the Puppeteer and Playwright network guides.
Launch flags as a backstop
Interception is the flexible workhorse, but a couple of launch flags add a cheap second layer that stops images at the rendering-engine level before interception even sees them:
const browser = await puppeteer.launch({
headless: 'new',
args: [
'--no-sandbox',
'--disable-dev-shm-usage',
'--blink-settings=imagesEnabled=false', // engine-level image off-switch
'--disable-blink-features=AutomationControlled',
],
});
The imagesEnabled=false flag and request interception are belt-and-braces — together they ensure that even an image you somehow let through the type filter never decodes. Use both; the cost is zero.
Measuring the bytes you save
Do not take the savings on faith — measure them, because that number is your proxy bill. The most accurate way in a Chromium browser is the DevTools protocol, which reports the real encoded bytes on the wire:
const client = await page.target().createCDPSession();
await client.send('Network.enable');
let bytes = 0;
client.on('Network.loadingFinished', (e) => {
bytes += e.encodedDataLength; // actual bytes received, post-compression
});
await page.goto('https://example.com/search?q=test', {
waitUntil: 'networkidle2',
});
console.log(`Downloaded ${(bytes / 1024).toFixed(1)} KB`);
Run it once with blocking off and once on. The difference is dramatic: published before-and-after tests show a page falling from around 1.9 MB to under 10 KB once images and other heavy resources are aborted. Multiply that by your daily page count and your per-gigabyte rate to see the real money — the full cost model is in the true cost of a Google scraper in 2026.
The CSS gotcha
One warning before you block everything. Most scrapers read the DOM, which exists whether or not CSS loads, so blocking stylesheets is usually safe. But not always.
Some sites only reveal or render certain elements after CSS applies, or trigger lazy-loading tied to layout, and blocking stylesheets can make those nodes vanish from the DOM your selectors target. The rule of thumb: block CSS, then assert your results still parse. If a particular site comes back empty, allow stylesheet for that target and keep blocking images, fonts, and media — you still capture the overwhelming majority of the savings, since images are the 91%.
This is also why silent failures are dangerous: a too-aggressive block list returns a valid page with zero results, and a naive scraper logs success. Assert on result counts, the same discipline from why your SERP scraper breaks at 3 a.m., and bandwidth blocking stays a pure win.
When you'd rather not meter bytes at all
Blocking resources is the right move when you run your own scrapers. But the deeper point is that proxy bandwidth is a cost you only carry because you are doing the access yourself. A SERP API moves that cost off your books entirely — you pay per call, not per gigabyte:
import requests
resp = requests.get(
"https://api.apiserpent.com/api/search",
headers={"X-API-Key": "sk_live_your_key"},
params={"q": "best running shoes 2026", "engine": "google", "country": "us"},
)
for r in resp.json()["results"]["organic"]:
print(r["position"], r["title"], r["url"])
No proxy bytes, no interception logic, no measuring — just clean JSON across Google, Bing, Yahoo, and DuckDuckGo. When your own scraping infrastructure starts to need a queue, circuit breakers, and a proxy budget, compare the maths in running SERP data at scale, or try the playground.
Stop paying for bytes you throw away.
Serpent returns clean JSON for Google, Bing, Yahoo, and DuckDuckGo with no per-gigabyte proxy bill on your side — the bandwidth is our problem. Get 10 free Google searches on signup, then pay-as-you-go from $0.03 per 10,000 searches at scale, with no subscription.
Get Your Free API KeyExplore: All SERP APIs · Google SERP API · Pricing
FAQ
How much bandwidth can I really save by blocking resources?
A lot. Images alone are roughly 91% of a typical page's weight, so blocking images, fonts, media, and stylesheets routinely cuts a multi-megabyte page down to tens of kilobytes — published before-and-after tests show pages dropping from around 1.9 MB to under 10 KB. On a per-gigabyte residential proxy plan that is the difference between dollars and pennies per thousand pages.
Will blocking CSS break my scraper?
Sometimes. Most scrapers read the DOM, which exists with or without CSS, so blocking stylesheets is usually safe. But some sites only render or reveal certain elements after CSS loads, or use it to lazy-load content, which can hide nodes your selectors need. Block CSS, then verify your results still parse; if a site misbehaves, allow stylesheets and keep blocking images and fonts.
Should I use the imagesEnabled launch flag or request interception?
Use both. The launch flag --blink-settings=imagesEnabled=false stops images at the rendering-engine level, while request interception lets you abort images, fonts, media, stylesheets, and tracking hosts before they download. Interception is more flexible and catches more; the flag is a cheap backstop. Together they minimize bytes over the proxy.
Does blocking resources speed up scraping too?
Yes. Fewer requests and fewer bytes mean pages finish loading faster, so blocking heavy resources cuts both your proxy bill and your per-page time. It also lowers memory pressure on the browser, which matters when you run many headless pages in parallel.



