Scrape Google Search Results in Python (2026): 3 Methods Tested Side by Side
Most "scrape Google with Python" tutorials show one method — whichever method the author is selling. The DIY ones tell you BeautifulSoup is enough. The proxy companies tell you headless browsers are the answer. The SERP API companies say only an API works. They are all partially right. The truth depends on what you are doing.
I built three working scrapers in Python in May 2026, ran the same 50 queries through each, and recorded what actually happened. This guide is the result — the most thorough comparison you will find on this topic, with code that you can copy and run today.
The Three Methods
- DIY:
requests+BeautifulSoup, with rotating residential proxies and realistic headers. - Headless browser: Playwright running real Chromium, residential proxy attached.
- SERP API: a managed Google scraping API (Serpent API in the test).
Same 50 queries, same machine, same hour. I logged success rate, time per query, total cost, and engineering time to ship.
Headline Results
| Method | Success rate | Time / query | Cost / 1K queries | Setup time | Maintenance |
|---|---|---|---|---|---|
| DIY (requests + BS4) | 62% | 3.5s + retries | ~$2–$5 (residential proxy) | ~6 hours | Constant (parser breaks) |
| Headless (Playwright) | 88% | 6–10s | ~$5–$15 (proxy + browser) | ~8 hours | Moderate |
| SERP API (Serpent) | 99% | 1.6s | ~$0.30 | ~30 minutes | None |
The SERP API wins by every metric. The DIY and headless methods are still useful for specific cases — mostly research, learning, or when the SERP API is unavailable for your jurisdiction. For production, the API is the only sensible choice in 2026.
Method 1: DIY with requests + BeautifulSoup
The cheapest-looking method. Also the most fragile.
import requests
from bs4 import BeautifulSoup
import random, time
PROXIES = [
"http://user:pass@residential1.example.com:8000",
"http://user:pass@residential2.example.com:8000",
# ...rotate through a pool of 20+ residential IPs
]
USER_AGENTS = [
"Mozilla/5.0 (Macintosh; Intel Mac OS X 14_5) AppleWebKit/605.1.15...",
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36...",
# ...10+ realistic UAs
]
def scrape_google_diy(query):
url = f"https://www.google.com/search?q={query.replace(' ', '+')}&hl=en&gl=us"
headers = {
"User-Agent": random.choice(USER_AGENTS),
"Accept": "text/html,application/xhtml+xml...",
"Accept-Language": "en-US,en;q=0.9",
}
proxy = {"http": random.choice(PROXIES), "https": random.choice(PROXIES)}
time.sleep(random.uniform(1, 3))
r = requests.get(url, headers=headers, proxies=proxy, timeout=20)
if r.status_code != 200 or "Our systems have detected" in r.text:
return None # CAPTCHA or block
soup = BeautifulSoup(r.text, "html.parser")
results = []
for el in soup.select("div.g"): # selectors break frequently
title_el = el.select_one("h3")
link_el = el.select_one("a")
if not (title_el and link_el):
continue
results.append({
"title": title_el.get_text(strip=True),
"url": link_el.get("href", ""),
})
return results
What happened in the test
- Success: 31 of 50 queries. Of the 19 failures, 12 hit a CAPTCHA, 5 returned a 429, and 2 returned HTML where my selectors did not match (Google had shipped a layout variant).
- Time per query: 3.5 seconds successful path, 8 to 12 seconds with retry.
- Cost: $2 to $5 per 1,000 queries on residential proxy pricing alone. Add $0 to $50/month engineering time amortised.
- Maintenance: the
div.gselector broke twice during development. Google ships layout variants weekly.
This method works for learning or for one-off scrapes of a few hundred queries. It fails for production at any scale because the maintenance is constant.
Method 2: Headless Browser (Playwright)
A real browser sidesteps most of the bot detection that catches the requests-based scraper. Slower and more expensive, but more reliable.
from playwright.async_api import async_playwright
import asyncio, random
async def scrape_google_headless(query, proxy_url):
async with async_playwright() as p:
browser = await p.chromium.launch(
proxy={"server": proxy_url}, headless=True
)
ctx = await browser.new_context(
user_agent="Mozilla/5.0 (Macintosh; Intel Mac OS X 14_5)...",
viewport={"width": 1920, "height": 1080},
locale="en-US",
)
page = await ctx.new_page()
await page.goto(
f"https://www.google.com/search?q={query.replace(' ', '+')}&hl=en&gl=us",
wait_until="networkidle",
)
if await page.locator("form#captcha-form").count():
await browser.close()
return None # CAPTCHA
results = await page.eval_on_selector_all(
"div.g",
"""elements => elements.map(el => ({
title: el.querySelector('h3')?.innerText || '',
url: el.querySelector('a')?.href || '',
}))"""
)
await browser.close()
return [r for r in results if r["title"] and r["url"]]
# Run
results = asyncio.run(scrape_google_headless("best protein powder", PROXY_URL))
What happened in the test
- Success: 44 of 50 queries. Of the 6 failures, 4 hit a CAPTCHA, 2 timed out waiting for network idle.
- Time per query: 6 to 10 seconds (browser startup + render + parse).
- Cost: $5 to $15 per 1,000 queries. Residential proxy bandwidth is higher because each query loads ~2 MB of full SERP HTML/CSS/JS, vs 60 KB for the bare HTML in the requests approach.
- Maintenance: moderate. Selectors still break, but less often. The browser handles JavaScript-rendered SERP sections that BS4 would miss.
This method works for production at modest scale (under 50K queries a month) when you can absorb the cost. Beyond that, the browser overhead and proxy bandwidth start to dominate.
Method 3: SERP API
The boring, working method. One HTTP call.
import requests
API_KEY = "sk_live_your_key_here"
def scrape_google_api(query):
r = requests.get("https://apiserpent.com/api/search", params={
"q": query, "engine": "google", "country": "us",
"api_key": API_KEY,
}, timeout=30)
data = r.json()
return data.get("organic_results", [])
What happened in the test
- Success: 50 of 50. Zero CAPTCHAs (the API takes care of that).
- Time per query: 1.6 seconds median.
- Cost: $0.30 per 1,000 quick searches at Scale tier.
- Maintenance: none. The API team handles parser maintenance across customers.
- Bonus data: AI Overview text, source citations, ads, related searches, People Also Ask — all already parsed in the JSON response.
For 99% of teams scraping Google in 2026, this is the only method that makes economic sense.
The Cost Math at Scale
The interesting comparison is what each method costs at real volumes once you include engineering time:
| Volume / month | DIY (proxy + eng time) | Headless (proxy + eng time) | SERP API (Serpent Scale) |
|---|---|---|---|
| 1,000 | $2 + 4 hrs/mo eng = ~$402 | $5 + 4 hrs/mo eng = ~$405 | $0.30 |
| 10,000 | $20 + 8 hrs/mo eng = ~$820 | $50 + 8 hrs/mo eng = ~$850 | $3 |
| 100,000 | $200 + 16 hrs/mo eng = ~$1,800 | $500 + 16 hrs/mo eng = ~$2,100 | $30 |
| 1,000,000 | $2,000 + 40 hrs/mo eng = ~$6,000 | $5,000 + 40 hrs/mo eng = ~$9,000 | $300 |
I am pricing engineering time at $100/hour, which is conservative. Even if you do not value your time at a market rate, the proxy cost alone is 6× to 30× higher on DIY/headless than on the SERP API.
When DIY or Headless Still Wins
To stay honest, three cases where direct scraping is the right call:
- You are learning. Building a scraper in BS4 is excellent practice. Just do not ship it to production.
- You need a SERP feature no API exposes. Edge cases like "extract the exact pixel position of an ad badge". Most APIs expose all the structured data you need; if you are in the rare exception, headless is the path.
- You are in a region SERP APIs cannot serve. A handful of jurisdictions block the major SERP API CDNs. If you are running from one of them, direct scraping with a local proxy may be your only choice.
The Decision Tree
- Production at any scale, you want it to just work. → SERP API.
- You need full browser interactivity (clicks, screenshots, JS state). → Playwright + residential proxy.
- You are learning or scraping a few hundred queries one time. → requests + BS4 with a proxy pool.
- You hit a CAPTCHA wall on DIY and headless and you do not want to pay residential proxy bills. → SERP API.
Common Mistakes
- Using datacenter proxies. Google blocks them within a few queries. Always use residential or mobile.
- Reusing one User-Agent. Pair UA rotation with IP rotation. Always changing one signal but not the other still looks like a bot.
- Ignoring
hlandgl. Google's results vary by language and country. Pin them explicitly to get reproducible data. - Hitting Google at peak hours from a fresh IP. CAPTCHAs spike at peak hours. Rate-limit your scraper and run during off-peak.
- Not handling the "Did you mean" rewrite. Google sometimes auto-corrects your query. The response will still parse but the data is for a different query. Check for the rewrite warning.
Legal Footnote
Reading public SERP pages has been upheld as lawful under public-data court precedent in major markets. Google's terms of service forbid automated access, so contractually you are violating ToS, but ToS is not a criminal statute. Most teams choose a SERP API because it shifts the legal exposure to a vendor with formal compliance posture, not because direct scraping is illegal.
If you are scraping for competitive intelligence at any meaningful scale, talk to a lawyer regardless of which method you pick.
Skip the CAPTCHAs and Parser Maintenance
Serpent API gives every new account 10 free Google searches with full SERP feature parsing including AI Overview text and source citations — no credit card. Pay-as-you-go after that, from $0.30 per 1,000 quick searches at Scale tier.
Get Your Free API KeyExplore: SERP API · Playground · Web Scraping vs SERP API
FAQ
Can you still scrape Google with BeautifulSoup in 2026?
Technically yes, practically no. A naive script gets a CAPTCHA within 20 to 50 queries. With heavy effort (rotating residential proxies, realistic headers) you can extend the run, but the parser keeps breaking. For more than a few hundred queries a month, the method is no longer worth the engineering time.
How does a headless browser perform vs a SERP API?
Headless Playwright + residential proxy hits ~88% success in 6 to 10 seconds at $5 to $15 per 1,000. SERP API hits 99% in 1 to 3 seconds at $0.30 per 1,000.
Is scraping Google legal?
Reading public SERP pages has been upheld as lawful under US public-data court precedent. Google's ToS forbids automated access. Most teams use a SERP API to shift the legal exposure rather than because direct scraping is criminal.
Why is the SERP API the cheapest option?
Providers amortise proxy costs and parser maintenance across many customers. A residential proxy IP costs them less than $0.01 per query at scale; for an individual scraper, the same IP is $0.005 to $0.02 plus engineering time on parser fixes.
How do I rotate proxies in Python?
Pass a proxy URL to the requests session: session.proxies = {'http': 'http://user:pass@proxy:port'}. For rotation, maintain a list of proxy URLs and pick a fresh one per request. Use a residential provider; datacenter IPs get blocked.
Can I cache results to save cost?
Yes. SERP results are stable for 12 to 24 hours for most queries. Cache the response keyed on (query, country, language). For brand SERP monitoring, you may want a shorter TTL because your competitor positions can shift hour by hour.

