Scrape Google Search Results in Python (2026): 3 Methods Tested Side by Side

By Anurag Pathak· · 16 min read

Most "scrape Google with Python" tutorials show one method — whichever method the author is selling. The DIY ones tell you BeautifulSoup is enough. The proxy companies tell you headless browsers are the answer. The SERP API companies say only an API works. They are all partially right. The truth depends on what you are doing.

I built three working scrapers in Python in May 2026, ran the same 50 queries through each, and recorded what actually happened. This guide is the result — the most thorough comparison you will find on this topic, with code that you can copy and run today.

The Three Methods

  1. DIY: requests + BeautifulSoup, with rotating residential proxies and realistic headers.
  2. Headless browser: Playwright running real Chromium, residential proxy attached.
  3. SERP API: a managed Google scraping API (Serpent API in the test).

Same 50 queries, same machine, same hour. I logged success rate, time per query, total cost, and engineering time to ship.

Headline Results

MethodSuccess rateTime / queryCost / 1K queriesSetup timeMaintenance
DIY (requests + BS4)62%3.5s + retries~$2–$5 (residential proxy)~6 hoursConstant (parser breaks)
Headless (Playwright)88%6–10s~$5–$15 (proxy + browser)~8 hoursModerate
SERP API (Serpent)99%1.6s~$0.30~30 minutesNone

The SERP API wins by every metric. The DIY and headless methods are still useful for specific cases — mostly research, learning, or when the SERP API is unavailable for your jurisdiction. For production, the API is the only sensible choice in 2026.

Method 1: DIY with requests + BeautifulSoup

The cheapest-looking method. Also the most fragile.

import requests
from bs4 import BeautifulSoup
import random, time

PROXIES = [
    "http://user:pass@residential1.example.com:8000",
    "http://user:pass@residential2.example.com:8000",
    # ...rotate through a pool of 20+ residential IPs
]

USER_AGENTS = [
    "Mozilla/5.0 (Macintosh; Intel Mac OS X 14_5) AppleWebKit/605.1.15...",
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36...",
    # ...10+ realistic UAs
]

def scrape_google_diy(query):
    url = f"https://www.google.com/search?q={query.replace(' ', '+')}&hl=en&gl=us"
    headers = {
        "User-Agent": random.choice(USER_AGENTS),
        "Accept": "text/html,application/xhtml+xml...",
        "Accept-Language": "en-US,en;q=0.9",
    }
    proxy = {"http": random.choice(PROXIES), "https": random.choice(PROXIES)}
    time.sleep(random.uniform(1, 3))

    r = requests.get(url, headers=headers, proxies=proxy, timeout=20)
    if r.status_code != 200 or "Our systems have detected" in r.text:
        return None  # CAPTCHA or block

    soup = BeautifulSoup(r.text, "html.parser")
    results = []
    for el in soup.select("div.g"):  # selectors break frequently
        title_el = el.select_one("h3")
        link_el = el.select_one("a")
        if not (title_el and link_el):
            continue
        results.append({
            "title": title_el.get_text(strip=True),
            "url": link_el.get("href", ""),
        })
    return results

What happened in the test

This method works for learning or for one-off scrapes of a few hundred queries. It fails for production at any scale because the maintenance is constant.

Method 2: Headless Browser (Playwright)

A real browser sidesteps most of the bot detection that catches the requests-based scraper. Slower and more expensive, but more reliable.

from playwright.async_api import async_playwright
import asyncio, random

async def scrape_google_headless(query, proxy_url):
    async with async_playwright() as p:
        browser = await p.chromium.launch(
            proxy={"server": proxy_url}, headless=True
        )
        ctx = await browser.new_context(
            user_agent="Mozilla/5.0 (Macintosh; Intel Mac OS X 14_5)...",
            viewport={"width": 1920, "height": 1080},
            locale="en-US",
        )
        page = await ctx.new_page()
        await page.goto(
            f"https://www.google.com/search?q={query.replace(' ', '+')}&hl=en&gl=us",
            wait_until="networkidle",
        )
        if await page.locator("form#captcha-form").count():
            await browser.close()
            return None  # CAPTCHA

        results = await page.eval_on_selector_all(
            "div.g",
            """elements => elements.map(el => ({
                title: el.querySelector('h3')?.innerText || '',
                url: el.querySelector('a')?.href || '',
            }))"""
        )
        await browser.close()
        return [r for r in results if r["title"] and r["url"]]

# Run
results = asyncio.run(scrape_google_headless("best protein powder", PROXY_URL))

What happened in the test

This method works for production at modest scale (under 50K queries a month) when you can absorb the cost. Beyond that, the browser overhead and proxy bandwidth start to dominate.

Method 3: SERP API

The boring, working method. One HTTP call.

import requests

API_KEY = "sk_live_your_key_here"

def scrape_google_api(query):
    r = requests.get("https://apiserpent.com/api/search", params={
        "q": query, "engine": "google", "country": "us",
        "api_key": API_KEY,
    }, timeout=30)
    data = r.json()
    return data.get("organic_results", [])

What happened in the test

For 99% of teams scraping Google in 2026, this is the only method that makes economic sense.

The Cost Math at Scale

The interesting comparison is what each method costs at real volumes once you include engineering time:

Volume / monthDIY (proxy + eng time)Headless (proxy + eng time)SERP API (Serpent Scale)
1,000$2 + 4 hrs/mo eng = ~$402$5 + 4 hrs/mo eng = ~$405$0.30
10,000$20 + 8 hrs/mo eng = ~$820$50 + 8 hrs/mo eng = ~$850$3
100,000$200 + 16 hrs/mo eng = ~$1,800$500 + 16 hrs/mo eng = ~$2,100$30
1,000,000$2,000 + 40 hrs/mo eng = ~$6,000$5,000 + 40 hrs/mo eng = ~$9,000$300

I am pricing engineering time at $100/hour, which is conservative. Even if you do not value your time at a market rate, the proxy cost alone is 6× to 30× higher on DIY/headless than on the SERP API.

When DIY or Headless Still Wins

To stay honest, three cases where direct scraping is the right call:

  1. You are learning. Building a scraper in BS4 is excellent practice. Just do not ship it to production.
  2. You need a SERP feature no API exposes. Edge cases like "extract the exact pixel position of an ad badge". Most APIs expose all the structured data you need; if you are in the rare exception, headless is the path.
  3. You are in a region SERP APIs cannot serve. A handful of jurisdictions block the major SERP API CDNs. If you are running from one of them, direct scraping with a local proxy may be your only choice.

The Decision Tree

Common Mistakes

  1. Using datacenter proxies. Google blocks them within a few queries. Always use residential or mobile.
  2. Reusing one User-Agent. Pair UA rotation with IP rotation. Always changing one signal but not the other still looks like a bot.
  3. Ignoring hl and gl. Google's results vary by language and country. Pin them explicitly to get reproducible data.
  4. Hitting Google at peak hours from a fresh IP. CAPTCHAs spike at peak hours. Rate-limit your scraper and run during off-peak.
  5. Not handling the "Did you mean" rewrite. Google sometimes auto-corrects your query. The response will still parse but the data is for a different query. Check for the rewrite warning.

Legal Footnote

Reading public SERP pages has been upheld as lawful under public-data court precedent in major markets. Google's terms of service forbid automated access, so contractually you are violating ToS, but ToS is not a criminal statute. Most teams choose a SERP API because it shifts the legal exposure to a vendor with formal compliance posture, not because direct scraping is illegal.

If you are scraping for competitive intelligence at any meaningful scale, talk to a lawyer regardless of which method you pick.

Skip the CAPTCHAs and Parser Maintenance

Serpent API gives every new account 10 free Google searches with full SERP feature parsing including AI Overview text and source citations — no credit card. Pay-as-you-go after that, from $0.30 per 1,000 quick searches at Scale tier.

Get Your Free API Key

Explore: SERP API · Playground · Web Scraping vs SERP API

FAQ

Can you still scrape Google with BeautifulSoup in 2026?

Technically yes, practically no. A naive script gets a CAPTCHA within 20 to 50 queries. With heavy effort (rotating residential proxies, realistic headers) you can extend the run, but the parser keeps breaking. For more than a few hundred queries a month, the method is no longer worth the engineering time.

How does a headless browser perform vs a SERP API?

Headless Playwright + residential proxy hits ~88% success in 6 to 10 seconds at $5 to $15 per 1,000. SERP API hits 99% in 1 to 3 seconds at $0.30 per 1,000.

Is scraping Google legal?

Reading public SERP pages has been upheld as lawful under US public-data court precedent. Google's ToS forbids automated access. Most teams use a SERP API to shift the legal exposure rather than because direct scraping is criminal.

Why is the SERP API the cheapest option?

Providers amortise proxy costs and parser maintenance across many customers. A residential proxy IP costs them less than $0.01 per query at scale; for an individual scraper, the same IP is $0.005 to $0.02 plus engineering time on parser fixes.

How do I rotate proxies in Python?

Pass a proxy URL to the requests session: session.proxies = {'http': 'http://user:pass@proxy:port'}. For rotation, maintain a list of proxy URLs and pick a fresh one per request. Use a residential provider; datacenter IPs get blocked.

Can I cache results to save cost?

Yes. SERP results are stable for 12 to 24 hours for most queries. Cache the response keyed on (query, country, language). For brand SERP monitoring, you may want a shorter TTL because your competitor positions can shift hour by hour.