The Ultimate Guide to Scrape Google SERP in 2026 (for Free!)

By Serpent API Team · · 14 min read

"Free" is the word that brings most people to Google scraping, and it is genuinely possible. You can pull real Google search results onto your own machine without paying a cent — for a while, at a small scale, if you are patient.

This guide is the honest version. It shows you the three free routes, gives you working Python and Node code, and teaches the bandwidth and error tricks that decide whether your scraper survives an afternoon or a month.

It also tells you where "free" quietly turns into a bill — proxies, maintenance, and your own hours — so you can make the call with eyes open instead of finding out at 3 a.m.

TL;DR: The free routes are raw HTTP (fast but mostly broken in 2026), a headless browser (works, costs CPU and bandwidth), and Google's official Custom Search API (sanctioned, but 100 queries/day and not the live SERP). Build with Puppeteer or Playwright, block images and fonts to slash bandwidth, paginate with start= since num=100 is gone, and add residential proxies only when one IP stops being enough. Past a few hundred queries a day, a SERP API is cheaper than your time.

The three free ways to scrape Google

There are exactly three ways to get Google results without paying a provider, and each one breaks somewhere different. Knowing where saves you days.

Free routeWhat you getWhere it breaks
Raw HTTP (requests, httpx, curl)Milliseconds per call, no browser, trivial to writeGoogle now needs JavaScript — you get a shell or a consent wall, and you're blocked fast
Headless browser (Puppeteer, Playwright, Selenium)The real rendered SERP, every feature, exactly what a user seesHeavy on CPU, RAM, and bandwidth; still detectable; needs proxies at any volume
Official Custom Search JSON APIClean JSON, fully sanctioned by Google100 queries/day free, then paid and capped at 10k/day; it's a curated index, not the live web SERP

Most "scrape Google for free" tutorials only show route one, and that is exactly why their code stopped working. We will use route two for real results and treat route three as a backup for tiny, sanctioned workloads. If you only need a trickle of queries, we put every free option through its paces in free Google search APIs, tested honestly.

Why requests + BeautifulSoup returns nothing now

The classic five-line Python scraper is the first thing everyone tries. In 2026 it almost always comes back empty. Here is the code that used to work:

import requests
from bs4 import BeautifulSoup

url = "https://www.google.com/search?q=best+running+shoes&hl=en&gl=us"
html = requests.get(url, headers={"User-Agent": "Mozilla/5.0"}).text

soup = BeautifulSoup(html, "html.parser")
titles = [h3.get_text() for h3 in soup.select("h3")]
print(len(html), "bytes")
print(titles[:5])
# 2026: a small shell or a consent page -> titles is []

Two things changed. First, in early 2025 Google began requiring JavaScript to render the results page. A raw HTTP request downloads the HTML before any script runs, so the listings simply are not there yet.

Second, Google fingerprints the request itself. Python's requests has a TLS handshake and header set that no real Chrome ever produces, so even when a page loads you often get a consent wall or an "unusual traffic" notice instead of results.

The fix is not a cleverer header string. It is running a real browser that executes the JavaScript, which is route two.

Build a free scraper with a headless browser

A headless browser is a real Chromium with no visible window. It runs the page's JavaScript, so the rendered results actually exist in the DOM for you to read. The most popular free tools are Puppeteer (Node) and Playwright (Node or Python).

We'll use Puppeteer with the community stealth plugin, which patches the most obvious automation tells (like navigator.webdriver) before the page loads. Install it first:

npm install puppeteer-extra puppeteer-extra-plugin-stealth puppeteer

Here is a complete, working scraper. It preloads a consent cookie so the EU wall doesn't swallow the page, blocks heavy resources to save bandwidth, and reads each organic result by walking from the h3 heading up to its link.

const puppeteer = require('puppeteer-extra');
const Stealth = require('puppeteer-extra-plugin-stealth');
puppeteer.use(Stealth());

const BLOCK = new Set(['image', 'font', 'media', 'stylesheet']);

(async () => {
  const browser = await puppeteer.launch({
    headless: 'new',
    args: [
      '--no-sandbox',
      '--disable-dev-shm-usage',
      '--disable-blink-features=AutomationControlled',
      '--blink-settings=imagesEnabled=false',
    ],
  });
  const page = await browser.newPage();

  // Cut bandwidth: drop resources the parser never reads
  await page.setRequestInterception(true);
  page.on('request', (req) =>
    BLOCK.has(req.resourceType()) ? req.abort() : req.continue()
  );

  // Preload consent so Google serves results, not a cookie wall
  await page.setCookie({ name: 'CONSENT', value: 'YES+', domain: '.google.com' });

  const q = 'best running shoes 2026';
  await page.goto(
    `https://www.google.com/search?q=${encodeURIComponent(q)}&hl=en&gl=us`,
    { waitUntil: 'domcontentloaded', timeout: 30000 }
  );

  const results = await page.$$eval('#search a h3', (nodes) =>
    nodes.map((h3, i) => {
      const a = h3.closest('a');
      return { position: i + 1, title: h3.innerText, url: a ? a.href : null };
    })
  );

  console.log(results);
  await browser.close();
})();

A few choices in there matter. We wait for domcontentloaded instead of networkidle because the results land early and waiting for the network to go quiet just wastes seconds. We select #search a h3 and climb to the parent link, which survives Google's class-name churn far better than brittle generated classes like .yuRUbf.

If you prefer Python, the same logic works with Playwright: p.chromium.launch(), page.route() to block resources, and page.query_selector_all("#search a h3"). The shape is identical. For a Python-first walkthrough that pits three approaches against each other, see 3 ways to scrape Google results in Python, tested side by side.

Cut bandwidth: block images, fonts, and CSS

This is the single most important trick for keeping a scraper free, and almost no beginner tutorial mentions it. A full Google page pulls roughly 3 MB across dozens of requests, and the vast majority of those bytes are images you never parse.

When you scrape from your own IP, wasted bytes only cost you speed. The moment you add a residential proxy — billed per gigabyte — every blocked image is real money saved. That is why the scraper above aborts images, fonts, media, and stylesheets at the request layer.

The big lever is page.setRequestInterception(true) plus an abort() on heavy resource types, which we already wired in. Two cautions, though.

First, blocking stylesheets can occasionally hide elements your selectors depend on, so test that results still parse after you add it. Second, the launch flag --blink-settings=imagesEnabled=false stacks with interception and stops images at the engine level, which is belt-and-braces.

Done right, you can take a 3 MB page down to a few tens of kilobytes. On a per-gigabyte proxy plan that is the difference between pennies and dollars per thousand queries.

Getting 100 results after num=100 died

For years you could append &num=100 to a search URL and get the top 100 results in one request. On September 11, 2025, Google switched that off. Now you get 10 results per page regardless of the value you pass.

To reach 100 results you paginate with the start parameter — start=0, then 10, then 20, and so on — which means ten requests where one used to do the job. We cover the rank-tracking fallout in how the num=100 removal broke rank trackers.

Here is pagination with de-duplication, since overlapping pages sometimes repeat a URL, and a polite randomized delay so you don't fire ten identical requests in a tight loop:

async function scrapeGoogle(page, query, pages = 3) {
  const seen = new Set();
  const out = [];

  for (let p = 0; p < pages; p++) {
    const start = p * 10;
    await page.goto(
      `https://www.google.com/search?q=${encodeURIComponent(query)}` +
        `&hl=en&gl=us&start=${start}`,
      { waitUntil: 'domcontentloaded', timeout: 30000 }
    );

    const batch = await page.$$eval('#search a h3', (nodes) =>
      nodes.map((h3) => ({ title: h3.innerText, url: h3.closest('a')?.href }))
    );

    for (const r of batch) {
      if (r.url && !seen.has(r.url)) {
        seen.add(r.url);
        out.push({ position: out.length + 1, ...r });
      }
    }

    // Be human: wait 4-8s, never hammer
    await new Promise((res) => setTimeout(res, 4000 + Math.random() * 4000));
  }
  return out;
}

That randomized delay is not optional decoration. Request volume and rhythm are the two signals most likely to get you blocked, and ten pages is ten times more exposure than one. Slower is the single most effective free defense you have.

When free runs out: adding a proxy

One IP is fine for a few dozen queries a day. Push past that and your home or server IP starts answering every search with a CAPTCHA. The fix is to route requests through proxies so Google sees many different IPs.

Two types matter. Datacenter proxies are cheap and fast, but they come from known cloud ranges that Google distrusts on sight, so they burn out quickly for search. Residential proxies route through real consumer ISP connections, look like ordinary users, and survive far longer — but they cost more and are billed per gigabyte, which is why the bandwidth section above matters so much.

Wiring a proxy into Puppeteer is two lines: a launch flag for the gateway and page.authenticate() for the credentials. The pattern is identical no matter which provider you choose.

const browser = await puppeteer.launch({
  headless: 'new',
  args: [
    '--no-sandbox',
    '--proxy-server=http://gateway.your-proxy.com:7000',
  ],
});
const page = await browser.newPage();

// HTTP Basic auth for the proxy
await page.authenticate({
  username: 'YOUR_PROXY_USERNAME',
  password: 'YOUR_PROXY_PASSWORD',
});

Plenty of providers offer this with a per-gigabyte, pay-as-you-go plan — Bright Data, Oxylabs, Decodo, IPRoyal, DataImpulse, and SOAX among them. Pick any one; the wiring does not change. Just remember that a clean IP does not save a robotic request: if your headers, timing, or fingerprint still scream bot, the prettiest residential IP gets blocked anyway, which is the theme of why proxies get banned on Google.

Reality check: a production-grade free scraper is really three systems glued together — a rotating residential proxy pool, a fleet of headless browsers, and block-detection plus retry logic. Each needs maintenance forever. We add up the real bill in the true cost of a Google scraper in 2026.

The errors you will hit (429, CAPTCHA, empty)

Three failures will eat most of your debugging time. Recognizing them quickly is half the battle.

HTTP 429 / "unusual traffic". This is the soft block. Google has decided your IP or fingerprint looks automated and asks you to slow down or solve a CAPTCHA. Do not retry immediately — that confirms you're a bot. Back off exponentially with jitter:

import time, random

def get_with_backoff(fetch, max_tries=5):
    for attempt in range(max_tries):
        resp = fetch()
        if resp.status_code == 200 and "did not match any documents" not in resp.text:
            return resp
        delay = (2 ** attempt) + random.uniform(0, 1)  # 1s, 2s, 4s, 8s ...
        time.sleep(delay)
    raise RuntimeError("blocked after retries")

CAPTCHA pages. Sometimes Google returns a 200 OK whose body is a reCAPTCHA, not results. Always check the body, not just the status code, and treat a results count of zero on a popular query as a block, not a real empty result.

Silent empty results. The most dangerous failure: a valid page where your selectors return nothing because Google nudged the markup. Your code keeps running and logs success while collecting zero rows. Assert on result counts and alert when they drop — the same silent-failure problem we unpack in why your SERP scraper breaks at 3 a.m.

Is "free" actually free?

For a weekend project, yes. For anything ongoing, "free" is a budget line that moved from your wallet to your calendar. Here is the honest comparison.

FactorFree DIY scraperSERP API
Cash cost$0 at tiny scale; residential bandwidth per GB at volumeFlat per-call; deep results don't multiply price
Setup timeHours to days, then tuningMinutes — one HTTP request
MaintenanceForever: proxies, browsers, selectors, fingerprints all rotNone — access and parsing handled for you
Block rateUnpredictable; spikes when defenses changeEffectively zero from your side
JS renderingYou run and pay for headless browsersHandled; you get parsed JSON

The break-even is lower than people expect. Once you need steady daily volume, the combined cost of proxy bandwidth and the engineering hours to keep a scraper alive usually passes the price of an API. We lay the two approaches side by side in web scraping vs a SERP API and Google Search API vs scraping.

The one-call alternative

If you'd rather skip the proxy pool, the headless fleet, and the CAPTCHA-watching entirely, a SERP API does the access for you and hands back clean JSON. Here is the whole thing in Python with the Serpent Google SERP API:

import requests

resp = requests.get(
    "https://api.apiserpent.com/api/search",
    headers={"X-API-Key": "sk_live_your_key"},
    params={
        "q": "best running shoes 2026",
        "engine": "google",
        "country": "us",
        "num": 100,          # top 100 in ONE call
    },
)

data = resp.json()
for r in data["results"]["organic"]:
    print(r["position"], r["title"], r["url"])

That single call returns up to 100 organic results — the &num=100 behavior Google removed, restored at the API layer — plus ads, People Also Ask, related searches, featured snippets, AI Overviews, and the local pack. With Serpent, page depth does not multiply the price: a 100-result deep search costs the same as a 10-result one.

It's not just Google. Switch engine to bing, yahoo, or ddg, or hit the dedicated news and image endpoints — same clean JSON, no per-engine block-fighting. You can try any query live in the playground before writing a line of code.

Skip the proxies. Just get the data.

Serpent handles access — proxies, rendering, and CAPTCHAs are our problem, not yours — and returns clean JSON for Google, Bing, Yahoo, and DuckDuckGo. Get 10 free Google searches on signup, then pay-as-you-go from $0.03 per 10,000 searches at scale, with no charge for deep results and no subscription.

Get Your Free API Key

Explore: Google SERP API · All SERP APIs · Pricing

Scraping publicly available web pages is broadly treated as legal in the United States, but "legal" and "allowed by the site" are not the same thing.

The key precedent is hiQ Labs v. LinkedIn, where the Ninth Circuit held that scraping public data — data you can see without logging in — likely does not violate the Computer Fraud and Abuse Act, following the Supreme Court's narrow reading of the CFAA in Van Buren.

The caveats still matter: risk rises sharply when you scrape behind a login, collect personal data without a lawful basis, reproduce copyrighted content, or overload a server with volume. Terms of service are a separate question from criminal law. We go deeper in is scraping Google legal in 2026. None of this is legal advice — if it matters to your business, talk to a lawyer.

The practical takeaway: scrape responsibly at low volume from your own IP, and the moment you need scale, a SERP API keeps you on the well-trodden path of consuming an API instead of pounding a website.

FAQ

Can you really scrape Google search results for free?

Yes, for small volumes. A single computer with a headless browser can pull Google results at no cash cost as long as you go slow, send realistic browser headers, and accept the occasional block. The free part falls apart at scale, where you need proxies billed per gigabyte plus constant maintenance — the point where a paid API becomes cheaper than your own time.

Why does requests and BeautifulSoup return no results from Google now?

Since early 2025 Google requires JavaScript to render its results page, so a plain HTTP request downloads a near-empty shell or a consent wall instead of the listings. BeautifulSoup then finds no result nodes because they were never in the raw HTML. You need a real browser that runs the page's JavaScript, such as Puppeteer or Playwright.

Is the num=100 parameter really gone, and how do I get 100 results?

Google switched off num=100 on September 11, 2025, so you now get 10 results per page no matter what value you pass. To reach the top 100 you paginate with the start parameter (start=0, 10, 20, and so on), which is ten separate requests where one used to do the job.

Do I need a proxy to scrape Google for free?

Not for a handful of queries from one IP at a slow pace. But your home or server IP gets rate-limited fast, and once you need steady volume you must rotate residential proxies, which are billed per gigabyte. That bandwidth bill, plus the upkeep, is what turns free DIY scraping into a real cost.

Is it legal to scrape Google search results?

Scraping publicly available pages is broadly treated as legal in the US after hiQ v. LinkedIn, which held that public data with no login is not covered by the CFAA. It can still breach a site's terms of service, and rules differ by country. This is general information, not legal advice.