Build a SERP Cache That Cuts Your API Bill 60%

By Anurag Pathak · May 17, 2026 · 12 min read

Building a SERP cache layer that cuts your search API bill by more than half

The cheapest SERP call is the one you never send. That sentence is the entire business case for this post, and "60%" is not marketing — it is roughly the duplication rate hiding in a typical SEO or rank-tracking workload, waiting to be reclaimed. Here is the layer that reclaims it, in code, without serving anyone stale or wrong data.

Why 60% Is a Realistic Number

SERP workloads are repetitive in three independent ways, and the repetition compounds:

Across customers. Ten clients all want "best CRM software." That is one ranking, billed once if you let it be.
Across time. A weekly rank tracker re-checks the same keyword set every Monday; most rankings barely moved.
Across features. A dashboard, an alert job and an export can each independently ask for the same query within minutes.

Your real number is your repeat-query ratio — measure it (we do that at the end). For most teams it lands between 40% and 70%. That entire band is money you are currently paying to fetch data you already have.

The Cache Key Is the Whole Game

Every cache bug and every missed saving traces back to the key. The rule is exact: the key must contain every input that changes the result, and nothing that doesn't.

function serpCacheKey({ engine, query, country, lang, device, num }) {
  const q = query.normalize('NFKC').trim().toLowerCase().replace(/\s+/g, ' ');
  // include everything that changes the result...
  return `serp:v1:${engine}:${country}:${lang}:${device}:${num}:${q}`;
  // ...and nothing that doesn't (no requestId, no timestamp, no userId)
}

Two failure modes, both common: leave out country or device and you serve a US desktop result to a UK mobile user — a wrong-result hit, worse than a miss. Or fold in a per-request field like a trace ID and every key is unique, so your hit rate is zero and the cache is just latency. Normalization (the NFKC, the trim, the whitespace collapse) is where a surprising share of the 60% is actually won — "Best CRM " and "best crm" are the same ranking.

Cache key normalization code that decides whether SERP cache hits land or miss — Most of the savings — and every wrong-result bug — lives in the key.

TTL by Intent, Not by Default

A single global TTL is always wrong. Set it to one hour and trending-news queries are stale; set it to one day and you are throwing away cheap hits on stable informational terms. Bucket by intent instead:

Query intent	Volatility	Reasonable TTL band
Evergreen informational ("what is X")	Very low	Hours to a day+
Commercial ("best X tool")	Low	Several hours
Local / "near me"	Medium	About an hour
News / trending	High	Minutes
Price / availability	Very high	Single-digit minutes

const TTL = { evergreen: 86400, commercial: 21600, local: 3600, news: 300, price: 120 };
function ttlFor(query) {
  if (/\b(price|cheap|deal|in stock)\b/i.test(query)) return TTL.price;
  if (/\b(news|today|latest|2026)\b/i.test(query))    return TTL.news;
  if (/\bnear me\b|\bin [A-Z]/i.test(query))           return TTL.local;
  if (/\bbest|top|vs|review\b/i.test(query))           return TTL.commercial;
  return TTL.evergreen;
}

This classifier is deliberately crude — a regex, not a model. It does not need to be smart; it needs to keep volatile queries fresh and let stable ones ride. If you also use freshness filters, align the TTL bands with the freshness windows you request so the two don't fight.

Stale-While-Revalidate: The Highest-Leverage Pattern

The naive cache blocks a user on a live call the instant a key expires — so every TTL boundary is a latency spike and a billed call. Stale-while-revalidate breaks that coupling: serve the slightly-stale value now, refresh in the background for the next caller.

async function getSerp(params) {
  const key = serpCacheKey(params);
  const hit = await cache.get(key);          // { data, at }
  const age = hit ? Date.now() - hit.at : Infinity;
  const ttl = ttlFor(params.query) * 1000;

  if (hit && age < ttl) return hit.data;                 // fresh
  if (hit && age < ttl * 3) {                             // stale but usable
    revalidate(key, params);                              // fire-and-forget
    return hit.data;                                      // instant response
  }
  return await fetchAndStore(key, params);                // cold miss
}

async function revalidate(key, params) {
  if (await lock.acquire(key)) {            // one refresher, not a stampede
    try { await fetchAndStore(key, params); } finally { lock.release(key); }
  }
}

The lock matters. Without it, a popular expired key triggers a refresh from every concurrent caller at once — a cache stampede that spikes your bill exactly when traffic is highest. One refresher per key; everyone else rides the stale value. This is the same anti-stampede discipline as the jittered backoff at scale, applied to the cache instead of the retry path.

Negative Caching

If a query reliably fails — malformed input, an unsupported locale, a genuinely zero-result term — an uncached retry loop will hammer that same failing call hundreds of times and bill you for every one. Cache the failure too, briefly:

try {
  const data = await provider(params);
  await cache.set(key, { data, at: Date.now() });
  return data;
} catch (e) {
  if (e.status === 400 || e.emptyButValid) {
    await cache.set(key, { negative: true, at: Date.now() }, { ttl: 90 });
  }
  throw e;
}

Keep the negative TTL short so a transient blip recovers fast — but never zero, or one bad query in a big batch can quietly dominate your spend. This pairs directly with the error-handling discipline in why your scraper breaks at 3 a.m.

A SERP cache is not an optimization you bolt on later. At any real volume it is part of the architecture — the same way it is in RAG and agent pipelines, where the same query recurs constantly across runs.

Measuring the Bill You Saved

"60%" is only true if you can show it. Instrument the cache from day one — you cannot improve a hit rate you do not log:

metrics.increment(`serp.cache.${outcome}`);  // hit | stale | miss | negative

// the only number that matters to finance:
const billable   = miss + coldRevalidate;          // calls actually sent
const wouldHave   = hit + stale + miss + negative;  // calls without a cache
const savedPct    = (1 - billable / wouldHave) * 100;

Put savedPct on the same dashboard as your spend. That is the line that justifies the cache in a budget review, and it is the one to watch when a key change or a TTL tweak silently regresses your hit rate. If you want the wider context for that dashboard, see SERP API observability; for the spend side, the pricing comparison shows why a flat per-call model makes "calls saved × price" a clean multiplication.

A cache dashboard showing the share of SERP API calls eliminated — If you can't show the saved-percentage line, you can't defend the cache in a budget review.

Five Ways to Poison a SERP Cache

An unstable key. A timestamp or request ID in the key means a 0% hit rate and pure overhead.
A missing dimension. No country/device in the key serves wrong results — the worst bug, because it looks like a saving.
One global TTL. Guaranteed to be too stale or too wasteful, never right.
No stampede lock. Expiry under load becomes a self-inflicted traffic spike.
No instrumentation. An unmeasured cache is a belief, not a saving.

Avoid those five and the layer above turns repetition — which your workload has a lot of — into the single highest-ROI change you can make to a SERP bill. Build it on a flat per-call API and every cached call is a number you can point to.

FAQ

How much can caching realistically cut a SERP API bill?

It depends entirely on query repetition, but for typical rank-tracking and SEO workloads a 40–70% reduction is normal, because the same head terms recur across customers, campaigns and weekly re-runs. The exact figure is whatever your duplication rate is — measure your repeat-query ratio before and after to get your real number.

What should the SERP cache key be?

Every input that changes the result, and nothing that doesn't. That means engine, normalized query, country, language, device and result depth — lowercased and trimmed. Leaving out country or device causes wrong-result cache hits; including volatile fields like a request ID destroys the hit rate. Normalization is where most of the savings is won or lost.

What TTL should I use for cached SERP data?

TTL should follow query intent, not a single global value. Stable informational queries tolerate hours or days; news, trending and price-sensitive queries need minutes. A single global TTL is always either too stale for volatile queries or too wasteful for stable ones, so bucket by intent.

What is stale-while-revalidate for a SERP cache?

It serves the cached result immediately even if slightly expired, then refreshes it in the background for the next caller. Users get a fast response and the data self-heals without a request ever blocking on a live call. It is the single highest-leverage pattern for both latency and cost in a SERP cache.

Should I cache failed SERP requests too?

Yes — briefly. Negative caching a known-bad query for a short window stops a retry storm from hammering the same failing call hundreds of times. Keep the negative TTL short so transient failures recover quickly, but never zero, or one bad query can dominate your spend.

Why 60% Is a Realistic Number

The Cache Key Is the Whole Game

TTL by Intent, Not by Default

Stale-While-Revalidate: The Highest-Leverage Pattern

Negative Caching

Measuring the Bill You Saved

Five Ways to Poison a SERP Cache

FAQ

How much can caching realistically cut a SERP API bill?

What should the SERP cache key be?

What TTL should I use for cached SERP data?

What is stale-while-revalidate for a SERP cache?

Should I cache failed SERP requests too?

References & Further Reading

Related Posts

How to Run Millions of SERP Requests Without Getting Rate-Limited

SERP API Pricing Comparison 2026

SERP API Observability: The Metrics That Catch Failures Early

How to Give Your AI Agent Real-Time Search with a SERP API