How to Run Millions of SERP Requests Without Getting Rate-Limited

By Anurag Pathak · May 17, 2026 · 12 min read

Running millions of SERP API requests at scale without getting rate-limited

The first time anyone runs a big SERP job, they write this:

const results = await Promise.all(
  millionKeywords.map(k => getSerp(k))   // please don't
);

It works for 50 keywords. At 50,000 it fires 50,000 requests in the same second, your success rate falls off a cliff, your memory balloons, and you spend the afternoon reading 429 Too Many Requests stack traces.

This article is the boring, reliable way to do it instead. It is mostly four ideas: bound your concurrency, retry politely, make work resumable, and don’t make calls you don’t need. None of it is clever. All of it is necessary.

The Mistake Everyone Makes First

Rate limits feel like the provider’s fault. They almost never are. Promise.all over a large array has unbounded concurrency — it does not do 1,000 keywords politely, it does 1,000 keywords at once. Any sane API answers a 1,000-wide burst with HTTP 429.

The fix is not a bigger plan. It is a smaller, steadier flow. A calm 8 requests in flight that all succeed beats 800 that mostly fail and retry into each other.

The Mental Model: A Pipeline, Not a Loop

Stop thinking “loop over keywords.” Start thinking “a queue feeding a fixed set of workers, writing results as they go.”

Building a bounded-concurrency SERP pipeline that automates a million keyword checks — Think pipeline, not for-loop: a queue feeding a fixed set of workers.

Everything below is just filling in that picture.

1. Bound Your Concurrency

A worker pool is a fixed number of async loops pulling from a shared queue. No library required:

async function runPool(items, worker, concurrency = 8) {
  const queue = items.slice();
  const results = [];
  async function loop() {
    while (queue.length) {
      const item = queue.shift();
      results.push(await worker(item));
    }
  }
  await Promise.all(Array.from({ length: concurrency }, loop));
  return results;
}

await runPool(millionKeywords, k => getSerp(k), 8);

Now there are never more than 8 requests in flight, no matter how long the list is. Start at 8, watch your success rate and p95 latency, and only raise it while both stay healthy. The moment errors climb or latency spikes, you have found your ceiling — back off one notch and stay there. Keep within your provider’s documented concurrency for your pricing tier.

2. Backoff With Jitter (the part people skip)

Some requests will still fail — a transient 429, a 503, a network hiccup. Retrying immediately just hits the wall again. Retrying on a schedule everyone else also uses is worse: every client wakes up on the same tick and stampedes the endpoint. AWS has a classic write-up on exactly this, Exponential Backoff And Jitter.

Latency and retry curves — why exponential backoff needs jitter to avoid synchronized spikes — Jitter is the difference between a queue and a stampede.

async function withRetry(fn, { tries = 4, base = 500 } = {}) {
  for (let attempt = 0; attempt < tries; attempt++) {
    try { return await fn(); }
    catch (e) {
      const retryable = [429, 502, 503, 504].includes(e.status);
      if (!retryable || attempt === tries - 1) throw e;
      const wait = base * 2 ** attempt + Math.random() * base; // jitter
      await new Promise(r => setTimeout(r, wait));
    }
  }
}

Two rules that save you: only retry retryable statuses (retrying a 400 forever is just a slow bug), and always honour a Retry-After header if the response sends one — the server is telling you exactly how long to wait.

3. Make Every Job Resumable

At a million keywords, something will interrupt you — a deploy, an OOM, a laptop lid. If a crash at keyword 600,000 means starting from zero, you do not have a pipeline, you have a slot machine.

Treat each keyword as an idempotent job with its own status, and write results as you go:

// jobs table: (keyword, status) — pending | done | failed
const pending = await db.jobs.where({ status: 'pending' }).limit(50000);

await runPool(pending, async (job) => {
  try {
    const data = await withRetry(() => getSerp(job.keyword));
    await db.results.put(job.keyword, data);          // write immediately
    await db.jobs.update(job.keyword, { status: 'done' });
  } catch (e) {
    await db.jobs.update(job.keyword, { status: 'failed', error: e.message });
  }
}, 8);

Restarting now skips everything already done. You can also re-run only the failed rows later. This is the same idempotency discipline a good rank tracker needs, just bigger — and it pairs naturally with the backup-plan architecture from the companion post. For the build-it-yourself counterpart — running your own scrapers rather than consuming the API — see SERP scraping at scale: queues, circuit breakers and caching.

4. Cache and De-Duplicate

Large keyword sets are repetitive. The same head terms appear across clients, campaigns and weekly re-runs. If you check “best running shoes” for ten customers, that is one ranking, not ten.

const key = `serp:${engine}:${country}:${query.toLowerCase().trim()}`;
const hit = await cache.get(key);
if (hit && Date.now() - hit.at < ttl) return hit.data;
const data = await withRetry(() => provider(query));
await cache.set(key, { data, at: Date.now() });
return data;

The fastest, cheapest, most rate-limit-proof request is the one you never send. At scale, caching is not an optimization — it is part of the architecture.

Putting Numbers On It

Say you have 1,000,000 keywords and each call averages 1.5 seconds.

The naive burst: undefined behaviour, mostly failures, unknowable finish time, a large bill for retried calls. The bounded pipeline at concurrency 8: roughly 1,000,000 × 1.5s ÷ 8 ≈ 52 hours of pure call time — before caching. Add a 40% cache hit rate from duplication and re-runs and you are nearer 31 hours, at 40% lower cost, with a near-100% success rate, and you can stop and resume any time.

Bump concurrency to 16 (if your tier and success rate allow) and you halve the wall-clock again. The point is not the exact hours — it is that the bounded version is predictable. You can quote a finish time. The naive version you cannot even estimate.

Cost predictability is the other half. With flat per-call pricing — the model Serpent API uses, and which we break down in the pricing comparison — a million calls is a number you can multiply out in advance instead of discovering on an invoice. Queue-and-credit models make that math much harder, which matters more the bigger you get.

The Scale Checklist

Bounded concurrency. A worker pool, never a raw Promise.all over the full list.
Retry only retryable errors, with exponential backoff plus jitter, honouring Retry-After.
Idempotent, resumable jobs. Per-keyword status, results written as you go.
Cache by engine + country + query with a sane TTL; de-dupe before you spend.
Store raw + parsed so re-runs and provider changes never corrupt history.
Watch success rate and p95, not just throughput. Tune concurrency against them.
Pick predictable pricing so a million calls is a multiplication, not a surprise.

Do these and “millions of SERP requests” stops being scary. It becomes a job that runs overnight, tells you where it is, and finishes when you said it would. That is the whole goal: boring, predictable, resumable scale.

FAQ

Why do I get rate-limited even with a high plan?

Almost always because of unbounded concurrency on your side, not the plan ceiling. A naive Promise.all over a big keyword list fires thousands of simultaneous requests in the first second, which trips per-account concurrency or per-minute limits instantly. The fix is a bounded worker pool — a fixed number of in-flight requests — not a bigger plan.

What is exponential backoff with jitter?

When a request fails with a retryable error such as HTTP 429 or 503, you wait before retrying, and you increase the wait on each attempt (exponential). Jitter adds a random component so many clients do not all retry on the same tick. Without jitter, retries synchronize into a thundering herd that keeps the endpoint saturated. The common formula is sleep = base * 2^attempt + random(0, base).

How many concurrent SERP requests should I run?

Start low — 4 to 8 in-flight — and raise it only while watching your success rate and p95 latency. The right number is where throughput stops improving or error rate starts climbing, whichever comes first. A steady 8 that all succeed beats 200 that mostly 429. Stay within your provider’s documented concurrency for your tier.

How do I make a million-keyword job resumable?

Treat each keyword as an idempotent job with its own status in a queue or table: pending, done, failed. Write results as each job completes, not at the end. If the process dies at keyword 600,000, you restart and it skips everything already marked done. Never hold a million results in memory or in one transaction.

Does caching really matter at scale?

It matters most at scale. Ranking data is not that volatile, and large keyword sets contain heavy duplication and overlap across runs. A cache keyed by engine, country and query, with a sensible time-to-live, routinely removes a large share of calls. The cheapest, fastest, most reliable request is the one you never send.