Scrape Google Search Results in Python (2026): 3 Methods Tested Side by Side

Q: Can you still scrape Google with BeautifulSoup in 2026?

Technically yes, practically no. A naive requests + BeautifulSoup script gets a CAPTCHA or 429 within the first 20 to 50 queries from the same IP. With heavy effort (rotating residential proxies, realistic headers, randomised timing) you can extend the run, but the parser keeps breaking when Google changes the SERP HTML. For more than a few hundred queries a month, this method is no longer worth the engineering time.

Q: How does a headless browser perform vs a SERP API?

Headless Playwright with a residential proxy hits about 88 percent success rate in our test, takes 6 to 10 seconds per query (rendering plus parsing), and costs around $5 per 1,000 queries once you factor in proxy bandwidth. A SERP API hits 99 percent success rate in 1 to 3 seconds at $0.30 per 1,000 queries on the cheapest provider. Headless is only competitive if you also need to interact with the SERP page beyond reading it.

Q: Why is the SERP API the cheapest option?

SERP APIs amortise proxy costs and parser maintenance across many customers. A single residential proxy IP costs the provider less than $0.01 per query at their scale; for an individual scraper, the same IP is $0.005 to $0.02 plus engineering time on parser fixes. The unit economics favour the API.

Q: How do I rotate proxies in Python?

Pass a proxy URL to the requests session: session.proxies = {'http': 'http://user:pass@proxy:port'}. For rotation, maintain a list of proxy URLs and pick a fresh one for each request. Use a residential pool (Bright Data, Oxylabs, Smartproxy) — datacenter IPs get blocked within a few queries on Google.

By Anurag Pathak· May 4, 2026· 16 min read

Developer tools running side-by-side Google scraping experiments in Python

Most "scrape Google with Python" tutorials show one method — whichever method the author is selling. The DIY ones tell you BeautifulSoup is enough. The proxy companies tell you headless browsers are the answer. The SERP API companies say only an API works. They are all partially right. The truth depends on what you are doing.

I built three working scrapers in Python in May 2026, ran the same 50 queries through each, and recorded what actually happened. This guide is the result — the most thorough comparison you will find on this topic, with code that you can copy and run today.

The Three Methods

DIY: requests + BeautifulSoup, with rotating residential proxies and realistic headers.
Headless browser: Playwright running real Chromium, residential proxy attached.
SERP API: a managed Google scraping API (Serpent API in the test).

Same 50 queries, same machine, same hour. I logged success rate, time per query, total cost, and engineering time to ship.

Headline Results

Method	Success rate	Time / query	Cost / 1K queries	Setup time	Maintenance
DIY (requests + BS4)	62%	3.5s + retries	~$2–$5 (residential proxy)	~6 hours	Constant (parser breaks)
Headless (Playwright)	88%	6–10s	~$5–$15 (proxy + browser)	~8 hours	Moderate
SERP API (Serpent)	99%	1.6s	~$0.30	~30 minutes	None

The SERP API wins by every metric. The DIY and headless methods are still useful for specific cases — mostly research, learning, or when the SERP API is unavailable for your jurisdiction. For production, the API is the only sensible choice in 2026.

Method 1: DIY with requests + BeautifulSoup

The cheapest-looking method. Also the most fragile.

import requests
from bs4 import BeautifulSoup
import random, time

PROXIES = [
    "http://user:pass@residential1.example.com:8000",
    "http://user:pass@residential2.example.com:8000",
    # ...rotate through a pool of 20+ residential IPs
]

USER_AGENTS = [
    "Mozilla/5.0 (Macintosh; Intel Mac OS X 14_5) AppleWebKit/605.1.15...",
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36...",
    # ...10+ realistic UAs
]

def scrape_google_diy(query):
    url = f"https://www.google.com/search?q={query.replace(' ', '+')}&hl=en&gl=us"
    headers = {
        "User-Agent": random.choice(USER_AGENTS),
        "Accept": "text/html,application/xhtml+xml...",
        "Accept-Language": "en-US,en;q=0.9",
    }
    proxy = {"http": random.choice(PROXIES), "https": random.choice(PROXIES)}
    time.sleep(random.uniform(1, 3))

    r = requests.get(url, headers=headers, proxies=proxy, timeout=20)
    if r.status_code != 200 or "Our systems have detected" in r.text:
        return None  # CAPTCHA or block

    soup = BeautifulSoup(r.text, "html.parser")
    results = []
    for el in soup.select("div.g"):  # selectors break frequently
        title_el = el.select_one("h3")
        link_el = el.select_one("a")
        if not (title_el and link_el):
            continue
        results.append({
            "title": title_el.get_text(strip=True),
            "url": link_el.get("href", ""),
        })
    return results

What happened in the test

Success: 31 of 50 queries. Of the 19 failures, 12 hit a CAPTCHA, 5 returned a 429, and 2 returned HTML where my selectors did not match (Google had shipped a layout variant).
Time per query: 3.5 seconds successful path, 8 to 12 seconds with retry.
Cost: $2 to $5 per 1,000 queries on residential proxy pricing alone. Add $0 to $50/month engineering time amortised.
Maintenance: the div.g selector broke twice during development. Google ships layout variants weekly.

This method works for learning or for one-off scrapes of a few hundred queries. It fails for production at any scale because the maintenance is constant.

Method 2: Headless Browser (Playwright)

A real browser sidesteps most of the bot detection that catches the requests-based scraper. Slower and more expensive, but more reliable.

from playwright.async_api import async_playwright
import asyncio, random

async def scrape_google_headless(query, proxy_url):
    async with async_playwright() as p:
        browser = await p.chromium.launch(
            proxy={"server": proxy_url}, headless=True
        )
        ctx = await browser.new_context(
            user_agent="Mozilla/5.0 (Macintosh; Intel Mac OS X 14_5)...",
            viewport={"width": 1920, "height": 1080},
            locale="en-US",
        )
        page = await ctx.new_page()
        await page.goto(
            f"https://www.google.com/search?q={query.replace(' ', '+')}&hl=en&gl=us",
            wait_until="networkidle",
        )
        if await page.locator("form#captcha-form").count():
            await browser.close()
            return None  # CAPTCHA

        results = await page.eval_on_selector_all(
            "div.g",
            """elements => elements.map(el => ({
                title: el.querySelector('h3')?.innerText || '',
                url: el.querySelector('a')?.href || '',
            }))"""
        )
        await browser.close()
        return [r for r in results if r["title"] and r["url"]]

# Run
results = asyncio.run(scrape_google_headless("best protein powder", PROXY_URL))

What happened in the test

Success: 44 of 50 queries. Of the 6 failures, 4 hit a CAPTCHA, 2 timed out waiting for network idle.
Time per query: 6 to 10 seconds (browser startup + render + parse).
Cost: $5 to $15 per 1,000 queries. Residential proxy bandwidth is higher because each query loads ~2 MB of full SERP HTML/CSS/JS, vs 60 KB for the bare HTML in the requests approach.
Maintenance: moderate. Selectors still break, but less often. The browser handles JavaScript-rendered SERP sections that BS4 would miss.

This method works for production at modest scale (under 50K queries a month) when you can absorb the cost. Beyond that, the browser overhead and proxy bandwidth start to dominate.

Method 3: SERP API

The boring, working method. One HTTP call.

import requests

API_KEY = "sk_live_your_key_here"

def scrape_google_api(query):
    r = requests.get("https://apiserpent.com/api/search", params={
        "q": query, "engine": "google", "country": "us",
        "api_key": API_KEY,
    }, timeout=30)
    data = r.json()
    return data.get("organic_results", [])

What happened in the test

Success: 50 of 50. Zero CAPTCHAs (the API takes care of that).
Time per query: 1.6 seconds median.
Cost: $0.30 per 1,000 quick searches at Scale tier.
Maintenance: none. The API team handles parser maintenance across customers.
Bonus data: AI Overview text, source citations, ads, related searches, People Also Ask — all already parsed in the JSON response.

For 99% of teams scraping Google in 2026, this is the only method that makes economic sense.

The Cost Math at Scale

The interesting comparison is what each method costs at real volumes once you include engineering time:

Volume / month	DIY (proxy + eng time)	Headless (proxy + eng time)	SERP API (Serpent Scale)
1,000	$2 + 4 hrs/mo eng = ~$402	$5 + 4 hrs/mo eng = ~$405	$0.30
10,000	$20 + 8 hrs/mo eng = ~$820	$50 + 8 hrs/mo eng = ~$850	$3
100,000	$200 + 16 hrs/mo eng = ~$1,800	$500 + 16 hrs/mo eng = ~$2,100	$30
1,000,000	$2,000 + 40 hrs/mo eng = ~$6,000	$5,000 + 40 hrs/mo eng = ~$9,000	$300

I am pricing engineering time at $100/hour, which is conservative. Even if you do not value your time at a market rate, the proxy cost alone is 6× to 30× higher on DIY/headless than on the SERP API.

When DIY or Headless Still Wins

To stay honest, three cases where direct scraping is the right call:

You are learning. Building a scraper in BS4 is excellent practice. Just do not ship it to production.
You need a SERP feature no API exposes. Edge cases like "extract the exact pixel position of an ad badge". Most APIs expose all the structured data you need; if you are in the rare exception, headless is the path.
You are in a region SERP APIs cannot serve. A handful of jurisdictions block the major SERP API CDNs. If you are running from one of them, direct scraping with a local proxy may be your only choice.

The Decision Tree

Production at any scale, you want it to just work. → SERP API.
You need full browser interactivity (clicks, screenshots, JS state). → Playwright + residential proxy.
You are learning or scraping a few hundred queries one time. → requests + BS4 with a proxy pool.
You hit a CAPTCHA wall on DIY and headless and you do not want to pay residential proxy bills. → SERP API.

Common Mistakes

Using datacenter proxies. Google blocks them within a few queries. Always use residential or mobile.
Reusing one User-Agent. Pair UA rotation with IP rotation. Always changing one signal but not the other still looks like a bot.
Ignoring hl and gl. Google's results vary by language and country. Pin them explicitly to get reproducible data.
Hitting Google at peak hours from a fresh IP. CAPTCHAs spike at peak hours. Rate-limit your scraper and run during off-peak.
Not handling the "Did you mean" rewrite. Google sometimes auto-corrects your query. The response will still parse but the data is for a different query. Check for the rewrite warning.

Legal Footnote

Reading public SERP pages has been upheld as lawful under public-data court precedent in major markets. Google's terms of service forbid automated access, so contractually you are violating ToS, but ToS is not a criminal statute. Most teams choose a SERP API because it shifts the legal exposure to a vendor with formal compliance posture, not because direct scraping is illegal.

If you are scraping for competitive intelligence at any meaningful scale, talk to a lawyer regardless of which method you pick.

Skip the CAPTCHAs and Parser Maintenance

Serpent API gives every new account 10 free Google searches with full SERP feature parsing including AI Overview text and source citations — no credit card. Pay-as-you-go after that, from $0.30 per 1,000 quick searches at Scale tier.

Get Your Free API Key

Explore: SERP API · Playground · Web Scraping vs SERP API

FAQ

Can you still scrape Google with BeautifulSoup in 2026?

Technically yes, practically no. A naive script gets a CAPTCHA within 20 to 50 queries. With heavy effort (rotating residential proxies, realistic headers) you can extend the run, but the parser keeps breaking. For more than a few hundred queries a month, the method is no longer worth the engineering time.

How does a headless browser perform vs a SERP API?

Headless Playwright + residential proxy hits ~88% success in 6 to 10 seconds at $5 to $15 per 1,000. SERP API hits 99% in 1 to 3 seconds at $0.30 per 1,000.

Is scraping Google legal?

Reading public SERP pages has been upheld as lawful under US public-data court precedent. Google's ToS forbids automated access. Most teams use a SERP API to shift the legal exposure rather than because direct scraping is criminal.

Why is the SERP API the cheapest option?

Providers amortise proxy costs and parser maintenance across many customers. A residential proxy IP costs them less than $0.01 per query at scale; for an individual scraper, the same IP is $0.005 to $0.02 plus engineering time on parser fixes.

How do I rotate proxies in Python?

Pass a proxy URL to the requests session: session.proxies = {'http': 'http://user:pass@proxy:port'}. For rotation, maintain a list of proxy URLs and pick a fresh one per request. Use a residential provider; datacenter IPs get blocked.

Can I cache results to save cost?

Yes. SERP results are stable for 12 to 24 hours for most queries. Cache the response keyed on (query, country, language). For brand SERP monitoring, you may want a shorter TTL because your competitor positions can shift hour by hour.

The Three Methods

Headline Results

Method 1: DIY with requests + BeautifulSoup

What happened in the test

Method 2: Headless Browser (Playwright)

What happened in the test

Method 3: SERP API

What happened in the test

The Cost Math at Scale

When DIY or Headless Still Wins

The Decision Tree

Common Mistakes

Legal Footnote

Skip the CAPTCHAs and Parser Maintenance

FAQ

Can you still scrape Google with BeautifulSoup in 2026?

How does a headless browser perform vs a SERP API?

Is scraping Google legal?

Why is the SERP API the cheapest option?

How do I rotate proxies in Python?

Can I cache results to save cost?

Related Posts

Web Scraping vs SERP API: When to Pick Which

Build a Google Rank Tracker in 100 Lines of Python

Plug Web Search into a LangChain Agent in 5 Minutes