How to Scrape Google Without Getting Blocked in 2026 (The Complete Playbook)
You wrote a small script to pull Google results. It worked for an hour. Then came the CAPTCHA. Then a wall of 429 Too Many Requests. By morning your IP was answering every query with a "we're sorry" page.
This is the single most common story in web scraping, and it has only gotten worse in 2026. Google's defenses are no longer "block bad IPs". They read your TLS handshake, your headers, your timing, and your behavior all at once, and decide within milliseconds whether you are a human or a bot.
This playbook is honest. It covers the legitimate, public best practices that genuinely reduce blocks if you do it yourself. It also tells you the truth most "12 tricks" articles will not: even done well, DIY Google scraping is an ongoing cat-and-mouse game that breaks at the worst time.
TL;DR: Google blocks scrapers using IP reputation, TLS/HTTP fingerprinting, JS-rendered results, and behavioral detection — all at once. If you DIY, use a real browser, realistic headers, residential IPs, slow randomized request rates, and exponential backoff. But at any real volume the cleanest answer is a SERP API: send one request, get clean JSON back, and skip the proxy pool, headless browser, and CAPTCHAs entirely.
Why Google blocks scrapers in 2026
Google blocks scrapers because modern anti-bot detection no longer trusts a single signal — it scores your IP, your fingerprint, and your behavior together. Pass four checks and fail one, and you are still blocked.
Here are the layers you are actually up against today.
1. JS-rendered results. Google's results page is not a simple HTML document anymore. A lot of it is assembled by JavaScript in the browser. A plain HTTP request that just downloads the raw HTML often gets a near-empty shell — or a consent wall — not the results you wanted.
2. TLS and HTTP fingerprinting. Before a single byte of HTML is sent, your client completes a TLS handshake. That handshake has a fingerprint (often called JA3/JA4). Python's requests library, Go's default client, and curl each produce a fingerprint that no real Chrome or Safari ever produces. Anti-bot systems match your fingerprint against known browsers in the first few milliseconds.
3. Datacenter IP blocklists. The IP ranges of AWS, Google Cloud, Azure, and most cheap proxy providers are well known. Traffic from them is treated as guilty until proven innocent. A clean-looking request from a datacenter IP still draws extra scrutiny.
4. Behavioral and ML detection. The newest layer watches how you act over time: request timing, the rhythm between calls, whether you ever load page 2, whether your "browser" moves a mouse or scrolls. Trust now accumulates across a session. One robotic burst resets it to zero.
When you fail enough of these checks, Google does not politely error out. It serves a CAPTCHA, a soft-block ("unusual traffic"), or a hard 429. Telling these apart in code is its own headache — we cover that in why your SERP scraper breaks at 3 a.m.
The hidden multiplier: the &num=100 change
There is a 2026 wrinkle that quietly makes blocking worse: collecting deep results now takes ten times more requests than it used to.
For years you could append &num=100 to a Google search URL and get 100 results on one page. On September 11, 2025, Google switched that off. Now you get the standard 10 results per page, no matter what value you pass.
To see the top 100 you must paginate — results 1–10, then 11–20, all the way down — which is ten separate requests where one used to do the job.
That matters here because request volume is the thing that gets you blocked. Ten times more requests means ten times more chances to trip a fingerprint or rate check. We break down the full fallout in Google killed &num=100, but the short version: your scraping just got harder and more expensive at the same time.
The honest DIY checklist (general best practice)
If you are going to scrape Google yourself, these are the public, widely-known best practices that genuinely lower your block rate. None of them is a guarantee — together they buy you time.
Send realistic browser headers and a current User-Agent. A real browser sends a full set of headers — Accept, Accept-Language, Accept-Encoding, Sec-Ch-Ua — in a specific order. A bare HTTP client sends almost none. At minimum, set a current Chrome User-Agent (Chrome 148 on macOS or Windows as of mid-2026) and match the rest of the header set a real browser would send. An outdated User-Agent string is itself a flag.
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
"AppleWebKit/537.36 (KHTML, like Gecko) "
"Chrome/148.0.0.0 Safari/537.36",
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
"Accept-Language": "en-US,en;q=0.9",
"Accept-Encoding": "gzip, deflate, br",
}
Slow down and randomize. Humans do not fire ten identical requests per second. Add a randomized delay between calls (say 5–15 seconds), vary it, and never hammer in a tight loop. Slower is the single most effective free lever you have.
Use exponential backoff on failures. When you hit a soft-block or a 429, do not retry immediately — that confirms you are a bot. Back off: wait 2s, then 4s, then 8s, with a little random jitter, and stop after a few tries.
import time, random
def get_with_backoff(fetch, max_tries=5):
for attempt in range(max_tries):
resp = fetch()
if resp.status_code == 200:
return resp
# 429 / soft-block: wait longer each time
delay = (2 ** attempt) + random.uniform(0, 1)
time.sleep(delay)
raise RuntimeError("blocked after retries")
Render the JavaScript. Because results are JS-assembled, a raw HTML fetch often is not enough. A real headless browser (Playwright or Puppeteer) actually runs the page's JavaScript, so you get the rendered results — at the cost of far more CPU, memory, and bandwidth per query.
Handle sessions and cookies. Real users carry cookies and a consent state across requests. Persist them within a session so your traffic looks continuous rather than like a thousand strangers.
Monitor for soft-blocks. The dangerous failure is the silent one: Google returns 200 OK with a consent page or a CAPTCHA inside a valid response. Check the body, not just the status code, and alert when result counts drop to zero.
For a full code walkthrough of the DIY route in Python, see 3 ways to scrape Google results in Python.
Datacenter vs residential proxies
Proxies change which IP Google sees, and the type of proxy matters more than the count. Here are the general trade-offs.
Datacenter proxies are cheap and fast, but they come from known cloud and hosting IP ranges. Google recognizes those ranges and treats them with suspicion, so they get blocked quickly for search scraping.
Residential proxies route through real consumer ISP connections, so they look like ordinary home users. They are far harder to block, but they cost much more (priced per gigabyte) and are slower.
The hard truth: a clean IP does not save a robotic request. If your TLS fingerprint, headers, or timing still scream "bot", the prettiest residential IP in the world gets blocked anyway. And IPs that get overused for Google get burned — the topic of why proxies get banned on Google.
Reality check: a working DIY Google scraper is really three systems glued together — a rotating residential proxy pool, a fleet of headless browsers, and block-detection plus retry logic. Each one needs maintenance forever. That is the real cost most tutorials hide. We total it up in the true cost of a Google scraper in 2026.
Why "done right" still breaks
Even a well-built DIY scraper breaks, because the other side keeps changing. Anti-bot detection is a moving target maintained by a very large team, and you are one person reacting after the fact.
A few things that go wrong on a good build:
- Chrome ships a new version, the User-Agent and client hints shift, and your "current" fingerprint is suddenly stale.
- A proxy subnet you rely on gets burned, and your block rate jumps overnight with no code change on your side.
- Google quietly changes the results page markup, and your CSS selectors return empty — while still returning
200 OK. - A behavioral model gets tuned, and the request rate that was safe last week now trips a CAPTCHA.
None of these throw a clean error you can catch once and forget. They show up as silent data gaps and 3 a.m. pages. Maintaining a scraper is not a project you finish; it is a subscription you pay in engineering time.
DIY scraping vs a SERP API
Here is the honest comparison, side by side, for production use at any real volume.
| Factor | DIY scraping | SERP API |
|---|---|---|
| Maintenance | Ongoing: proxies, browsers, selectors, fingerprints all rot | None — the API handles access and parsing for you |
| Block rate | Unpredictable; spikes whenever defenses change | Effectively zero from your side — you get clean JSON or a clear error |
| JS rendering | You run and pay for headless browsers | Handled; you just receive parsed results |
| Cost at scale | Residential bandwidth + servers + engineer time | Flat per-call; deep results don't multiply the price |
| Time to first result | Days to weeks of build and tuning | Minutes — one HTTP request |
| Legal posture | You own all access decisions and ToS exposure | You consume an API; the provider manages collection |
For a deeper breakdown of the two approaches, read web scraping vs a SERP API and Google Search API vs scraping.
The clean answer at scale
For anything in production, a SERP API removes the entire blocking problem, because access is handled for you and you just consume clean, structured data.
You send one HTTP request with your query. You get back JSON: organic results, ads, People Also Ask, related searches, featured snippets, AI Overviews, the local pack, and more. There is no proxy pool, no headless browser, and no CAPTCHA-solving for you to build or maintain.
Here is the entire thing in Python with the Serpent Google SERP API:
import requests
resp = requests.get(
"https://api.apiserpent.com/api/search",
headers={"X-API-Key": "sk_live_your_key"},
params={
"q": "best running shoes 2026",
"engine": "google",
"country": "us",
"num": 100, # top 100 in ONE call
},
)
data = resp.json()
for r in data["results"]["organic"]:
print(r["position"], r["title"], r["url"])
That single call returns up to 100 organic results — the &num=100 behavior Google removed, restored at the API layer. With Serpent, page depth does not multiply the price: a 100-result deep search costs the same as a 10-result one. And the wire response is the same whether you ask for one page or ten.
It is not just Google. You can switch engine to bing, yahoo, or ddg, or hit the dedicated news and image endpoints — same clean JSON, no extra block-fighting per engine. Try any query live in the playground before you write a line of code.
Stop fighting blocks. Start getting data.
Serpent handles the access — proxies, rendering, and CAPTCHAs are our problem, not yours — and returns clean JSON for Google, Bing, Yahoo, and DuckDuckGo. Get 10 free Google searches on signup, pay-as-you-go from $0.03 per 10,000 searches at scale, and never pay more for deep results. No subscription.
Get Your Free API KeyExplore: Google SERP API · All SERP APIs · Pricing
Is any of this legal?
Scraping publicly available web pages is broadly treated as legal in the United States, but "legal" and "allowed by the site" are not the same thing.
The key precedent is hiQ Labs v. LinkedIn, where the Ninth Circuit held that scraping public data — data you can see without logging in — likely does not violate the Computer Fraud and Abuse Act, following the Supreme Court's narrow reading of the CFAA in Van Buren.
The caveats matter, though. Risk rises sharply when you scrape behind a login, collect personal data without a lawful basis, reproduce copyrighted content, or overwhelm a server with excessive volume. Terms of service are a separate question from criminal law. We go deeper in is scraping Google legal in 2026. None of this is legal advice — if it matters to your business, talk to a lawyer.
The practical takeaway: scraping responsibly means low volume, respecting the site, and not pretending to be a logged-in user. Once you need volume, a SERP API keeps you on the well-trodden path of consuming an API instead of pounding a website.
FAQ
Why does Google block my scraper?
Google fingerprints the whole request, not just your IP. A plain HTTP library has a TLS handshake that no real browser produces, ships no real browser headers, and fires requests faster than any human. Combine that with a datacenter IP and Google flags it as a bot in milliseconds, then serves a CAPTCHA or a 429.
Is it legal to scrape Google search results?
Scraping publicly available web pages is broadly treated as legal in the US after hiQ v. LinkedIn, which held that public data without a login is not covered by the CFAA. But it can still breach a site's terms of service, and rules differ by country. This is general information, not legal advice.
Do residential proxies stop Google blocks?
They help, but they are not a magic fix. Residential IPs look more like real users than datacenter IPs, so they get blocked less often. But if your headers, TLS fingerprint, or request timing still look robotic, Google blocks you regardless of how clean the IP is.
What is the easiest way to scrape Google without getting blocked?
Use a SERP API. You send one HTTP request with your query and get back clean JSON of the results. There is no proxy pool, headless browser, or CAPTCHA-solving for you to build or maintain, because the API handles access for you. It is the only approach that stays reliable at scale.
How many Google requests can I send before getting blocked?
There is no fixed number. A single bad-looking request can trigger a CAPTCHA, while a careful, slow, browser-like session can run longer. Google weighs IP reputation, fingerprint, and behavior together, so the safe rate is low and unpredictable, which is exactly why DIY scraping is hard to keep stable.



