Should You Block or Allow AI Crawlers? A 2026 Decision Guide

By Serpent API Team · · 10 min read

It's the dilemma every publisher and marketing lead is wrestling with in 2026. Block the AI crawlers and protect your content — but disappear from the AI answers that more and more people now read instead of clicking. Or let them in — and risk feeding the very machine that's hollowing out your traffic.

The stakes are real. Some publishers have reported organic traffic dropping 70–80% as AI answers expanded; one education site reported nearly a 50% decline; certain news queries fell almost 90%. NPR called it an "extinction-level event" for online news. Against that, brands cited inside an AI Overview see around 35% higher click-through than those left out.

So this isn't a one-size answer. It's a decision — and this guide gives you a clear, numbers-based way to make it for your site, plus the exact robots.txt lines once you've chosen.

Meet the crawlers (they're not all the same)

"AI crawler" lumps together bots that do very different jobs. The key distinction: training crawlers harvest content to train future models, while answer/grounding crawlers fetch pages to build the live answers users see now — the ones that can cite you.

That difference is the whole game: you can welcome the bots that put you in answers while declining the ones that only feed training sets.

The two real questions

Strip away the noise and your decision comes down to two:

1. Does AI visibility help or hurt your business model? If you sell products, services, or software, being recommended by ChatGPT is a top-of-funnel win — the click matters less than the recommendation. If you are the content (ad-funded media, a reference site), every answered query without a click is lost revenue.

2. Are you already getting cited? If AI engines cite you and send qualified visitors, blocking would cut a working channel. If they scrape you and cite you never, you're getting the worst of both and blocking the training bots costs you little.

AI visibility helps your model? No (you ARE the content) Yes (you sell something) Block training bots consider licensing deals Allow answer bots optimize to get cited Either way: measure citations before & after

The decision matrix

Mapping the two questions gives four practical positions:

Before you decide, get the data. Check whether AI answers actually cite you today with the AI Rank API and the Google SERP API. See pricing →

The robots.txt lines

Once you've chosen, the implementation is a few lines at your site root. To block training crawlers but keep answer visibility:

# robots.txt — block training, allow answer/grounding bots

User-agent: GPTBot
Disallow: /

User-agent: Google-Extended
Disallow: /

# (left allowed by omission: OAI-SearchBot, PerplexityBot,
#  ClaudeBot, and Googlebot for normal Search)

To block everything AI while staying in classic Google Search:

User-agent: GPTBot
Disallow: /
User-agent: OAI-SearchBot
Disallow: /
User-agent: Google-Extended
Disallow: /
User-agent: ClaudeBot
Disallow: /
User-agent: PerplexityBot
Disallow: /

# Googlebot is NOT listed here, so normal Search indexing continues.

Two cautions. First, robots.txt is a polite request, not a wall — reputable bots honour it, but it isn't enforcement. Second, blocking Google-Extended does not remove you from Google Search; Googlebot is a separate agent. If you also want to publish an AI-friendly content map, see our llms.txt guide.

Decide with data, not vibes

The mistake is treating this as a one-time gut call. Treat it as an experiment:

  1. Baseline. Measure how often AI engines cite you today, and how much referral traffic they send (check your analytics for ChatGPT, Perplexity, and Gemini referrers).
  2. Change one thing. Adjust your robots.txt, or invest in getting cited.
  3. Re-measure in 4–6 weeks. Did citations and referral traffic rise or fall? Let the numbers, not the headlines, drive the next move.

The companies that win this aren't the ones with the strongest opinion — they're the ones measuring. Track your citation share (the AI share-of-voice tracker is built for exactly this) and connect it to traffic with The Great Decoupling. Then your robots.txt becomes a tested decision you can defend, not a guess.

FAQ

What happens if I block AI crawlers?

You stop your content feeding those models — and you also become much less likely to be cited in their answers, which increasingly drive the surviving clicks.

Does blocking Google-Extended remove me from Google Search?

No. Google-Extended only governs Gemini and AI features. Googlebot handles normal Search indexing separately, so you stay indexed.

Is there a measurable upside to being in AI answers?

Yes — cited brands see roughly 35% higher CTR than uncited ones, and AI-referred visitors tend to convert better because they arrive informed.

Can I allow some crawlers and block others?

Yes. robots.txt rules are per user-agent, so you can allow answer bots that drive visibility while blocking pure training scrapers.

See Whether AI Answers Cite You

Before you touch robots.txt, get the facts. The AI Rank API and Google SERP API show exactly who AI engines cite for your queries.

Get Your Free API Key

Explore: AI Rank API · Google SERP API · Playground · Docs