Build a SERP Cache That Cuts Your API Bill 60%

By Anurag Pathak · · 12 min read

The cheapest SERP call is the one you never send. That sentence is the entire business case for this post, and "60%" is not marketing — it is roughly the duplication rate hiding in a typical SEO or rank-tracking workload, waiting to be reclaimed. Here is the layer that reclaims it, in code, without serving anyone stale or wrong data.

Why 60% Is a Realistic Number

SERP workloads are repetitive in three independent ways, and the repetition compounds:

Your real number is your repeat-query ratio — measure it (we do that at the end). For most teams it lands between 40% and 70%. That entire band is money you are currently paying to fetch data you already have.

The Cache Key Is the Whole Game

Every cache bug and every missed saving traces back to the key. The rule is exact: the key must contain every input that changes the result, and nothing that doesn't.

function serpCacheKey({ engine, query, country, lang, device, num }) {
  const q = query.normalize('NFKC').trim().toLowerCase().replace(/\s+/g, ' ');
  // include everything that changes the result...
  return `serp:v1:${engine}:${country}:${lang}:${device}:${num}:${q}`;
  // ...and nothing that doesn't (no requestId, no timestamp, no userId)
}

Two failure modes, both common: leave out country or device and you serve a US desktop result to a UK mobile user — a wrong-result hit, worse than a miss. Or fold in a per-request field like a trace ID and every key is unique, so your hit rate is zero and the cache is just latency. Normalization (the NFKC, the trim, the whitespace collapse) is where a surprising share of the 60% is actually won — "Best CRM " and "best crm" are the same ranking.

Cache key normalization code that decides whether SERP cache hits land or miss
Most of the savings — and every wrong-result bug — lives in the key.

TTL by Intent, Not by Default

A single global TTL is always wrong. Set it to one hour and trending-news queries are stale; set it to one day and you are throwing away cheap hits on stable informational terms. Bucket by intent instead:

Query intentVolatilityReasonable TTL band
Evergreen informational ("what is X")Very lowHours to a day+
Commercial ("best X tool")LowSeveral hours
Local / "near me"MediumAbout an hour
News / trendingHighMinutes
Price / availabilityVery highSingle-digit minutes
const TTL = { evergreen: 86400, commercial: 21600, local: 3600, news: 300, price: 120 };
function ttlFor(query) {
  if (/\b(price|cheap|deal|in stock)\b/i.test(query)) return TTL.price;
  if (/\b(news|today|latest|2026)\b/i.test(query))    return TTL.news;
  if (/\bnear me\b|\bin [A-Z]/i.test(query))           return TTL.local;
  if (/\bbest|top|vs|review\b/i.test(query))           return TTL.commercial;
  return TTL.evergreen;
}

This classifier is deliberately crude — a regex, not a model. It does not need to be smart; it needs to keep volatile queries fresh and let stable ones ride. If you also use freshness filters, align the TTL bands with the freshness windows you request so the two don't fight.

Stale-While-Revalidate: The Highest-Leverage Pattern

The naive cache blocks a user on a live call the instant a key expires — so every TTL boundary is a latency spike and a billed call. Stale-while-revalidate breaks that coupling: serve the slightly-stale value now, refresh in the background for the next caller.

async function getSerp(params) {
  const key = serpCacheKey(params);
  const hit = await cache.get(key);          // { data, at }
  const age = hit ? Date.now() - hit.at : Infinity;
  const ttl = ttlFor(params.query) * 1000;

  if (hit && age < ttl) return hit.data;                 // fresh
  if (hit && age < ttl * 3) {                             // stale but usable
    revalidate(key, params);                              // fire-and-forget
    return hit.data;                                      // instant response
  }
  return await fetchAndStore(key, params);                // cold miss
}

async function revalidate(key, params) {
  if (await lock.acquire(key)) {            // one refresher, not a stampede
    try { await fetchAndStore(key, params); } finally { lock.release(key); }
  }
}

The lock matters. Without it, a popular expired key triggers a refresh from every concurrent caller at once — a cache stampede that spikes your bill exactly when traffic is highest. One refresher per key; everyone else rides the stale value. This is the same anti-stampede discipline as the jittered backoff at scale, applied to the cache instead of the retry path.

Negative Caching

If a query reliably fails — malformed input, an unsupported locale, a genuinely zero-result term — an uncached retry loop will hammer that same failing call hundreds of times and bill you for every one. Cache the failure too, briefly:

try {
  const data = await provider(params);
  await cache.set(key, { data, at: Date.now() });
  return data;
} catch (e) {
  if (e.status === 400 || e.emptyButValid) {
    await cache.set(key, { negative: true, at: Date.now() }, { ttl: 90 });
  }
  throw e;
}

Keep the negative TTL short so a transient blip recovers fast — but never zero, or one bad query in a big batch can quietly dominate your spend. This pairs directly with the error-handling discipline in why your scraper breaks at 3 a.m.

A SERP cache is not an optimization you bolt on later. At any real volume it is part of the architecture — the same way it is in RAG and agent pipelines, where the same query recurs constantly across runs.

Measuring the Bill You Saved

"60%" is only true if you can show it. Instrument the cache from day one — you cannot improve a hit rate you do not log:

metrics.increment(`serp.cache.${outcome}`);  // hit | stale | miss | negative

// the only number that matters to finance:
const billable   = miss + coldRevalidate;          // calls actually sent
const wouldHave   = hit + stale + miss + negative;  // calls without a cache
const savedPct    = (1 - billable / wouldHave) * 100;

Put savedPct on the same dashboard as your spend. That is the line that justifies the cache in a budget review, and it is the one to watch when a key change or a TTL tweak silently regresses your hit rate. If you want the wider context for that dashboard, see SERP API observability; for the spend side, the pricing comparison shows why a flat per-call model makes "calls saved × price" a clean multiplication.

A cache dashboard showing the share of SERP API calls eliminated
If you can't show the saved-percentage line, you can't defend the cache in a budget review.

Five Ways to Poison a SERP Cache

Avoid those five and the layer above turns repetition — which your workload has a lot of — into the single highest-ROI change you can make to a SERP bill. Build it on a flat per-call API and every cached call is a number you can point to.

FAQ

How much can caching realistically cut a SERP API bill?

It depends entirely on query repetition, but for typical rank-tracking and SEO workloads a 40–70% reduction is normal, because the same head terms recur across customers, campaigns and weekly re-runs. The exact figure is whatever your duplication rate is — measure your repeat-query ratio before and after to get your real number.

What should the SERP cache key be?

Every input that changes the result, and nothing that doesn't. That means engine, normalized query, country, language, device and result depth — lowercased and trimmed. Leaving out country or device causes wrong-result cache hits; including volatile fields like a request ID destroys the hit rate. Normalization is where most of the savings is won or lost.

What TTL should I use for cached SERP data?

TTL should follow query intent, not a single global value. Stable informational queries tolerate hours or days; news, trending and price-sensitive queries need minutes. A single global TTL is always either too stale for volatile queries or too wasteful for stable ones, so bucket by intent.

What is stale-while-revalidate for a SERP cache?

It serves the cached result immediately even if slightly expired, then refreshes it in the background for the next caller. Users get a fast response and the data self-heals without a request ever blocking on a live call. It is the single highest-leverage pattern for both latency and cost in a SERP cache.

Should I cache failed SERP requests too?

Yes — briefly. Negative caching a known-bad query for a short window stops a retry storm from hammering the same failing call hundreds of times. Keep the negative TTL short so transient failures recover quickly, but never zero, or one bad query can dominate your spend.