Should You Block or Allow AI Crawlers? A 2026 Decision Guide
It's the dilemma every publisher and marketing lead is wrestling with in 2026. Block the AI crawlers and protect your content — but disappear from the AI answers that more and more people now read instead of clicking. Or let them in — and risk feeding the very machine that's hollowing out your traffic.
The stakes are real. Some publishers have reported organic traffic dropping 70–80% as AI answers expanded; one education site reported nearly a 50% decline; certain news queries fell almost 90%. NPR called it an "extinction-level event" for online news. Against that, brands cited inside an AI Overview see around 35% higher click-through than those left out.
So this isn't a one-size answer. It's a decision — and this guide gives you a clear, numbers-based way to make it for your site, plus the exact robots.txt lines once you've chosen.
Meet the crawlers (they're not all the same)
"AI crawler" lumps together bots that do very different jobs. The key distinction: training crawlers harvest content to train future models, while answer/grounding crawlers fetch pages to build the live answers users see now — the ones that can cite you.
- GPTBot (OpenAI) — primarily training. OAI-SearchBot handles ChatGPT search surfacing.
- Google-Extended — controls use of your content for Gemini and AI features. Separate from Googlebot, which still handles normal Search.
- ClaudeBot (Anthropic) and PerplexityBot — grounding and answer crawlers for their assistants.
That difference is the whole game: you can welcome the bots that put you in answers while declining the ones that only feed training sets.
The two real questions
Strip away the noise and your decision comes down to two:
1. Does AI visibility help or hurt your business model? If you sell products, services, or software, being recommended by ChatGPT is a top-of-funnel win — the click matters less than the recommendation. If you are the content (ad-funded media, a reference site), every answered query without a click is lost revenue.
2. Are you already getting cited? If AI engines cite you and send qualified visitors, blocking would cut a working channel. If they scrape you and cite you never, you're getting the worst of both and blocking the training bots costs you little.
The decision matrix
Mapping the two questions gives four practical positions:
- Sell something + already cited → Allow, and optimize. AI answers are feeding you qualified leads. Lean in: make yourself easier to cite with clean structured data.
- Sell something + not cited yet → Allow, and earn citations. Blocking now would just guarantee invisibility. Open the door and work on being the answer.
- You are the content + cited with real referral traffic → Allow selectively. Keep the answer bots that send clicks; watch the ratio closely.
- You are the content + scraped but never cited → Block training bots. You're subsidizing models that don't return the favour. Block them and pursue licensing where it makes sense.
Before you decide, get the data. Check whether AI answers actually cite you today with the AI Rank API and the Google SERP API. See pricing →
The robots.txt lines
Once you've chosen, the implementation is a few lines at your site root. To block training crawlers but keep answer visibility:
# robots.txt — block training, allow answer/grounding bots
User-agent: GPTBot
Disallow: /
User-agent: Google-Extended
Disallow: /
# (left allowed by omission: OAI-SearchBot, PerplexityBot,
# ClaudeBot, and Googlebot for normal Search)
To block everything AI while staying in classic Google Search:
User-agent: GPTBot
Disallow: /
User-agent: OAI-SearchBot
Disallow: /
User-agent: Google-Extended
Disallow: /
User-agent: ClaudeBot
Disallow: /
User-agent: PerplexityBot
Disallow: /
# Googlebot is NOT listed here, so normal Search indexing continues.
Two cautions. First, robots.txt is a polite request, not a wall — reputable bots honour it, but it isn't enforcement. Second, blocking Google-Extended does not remove you from Google Search; Googlebot is a separate agent. If you also want to publish an AI-friendly content map, see our llms.txt guide.
Decide with data, not vibes
The mistake is treating this as a one-time gut call. Treat it as an experiment:
- Baseline. Measure how often AI engines cite you today, and how much referral traffic they send (check your analytics for ChatGPT, Perplexity, and Gemini referrers).
- Change one thing. Adjust your
robots.txt, or invest in getting cited. - Re-measure in 4–6 weeks. Did citations and referral traffic rise or fall? Let the numbers, not the headlines, drive the next move.
The companies that win this aren't the ones with the strongest opinion — they're the ones measuring. Track your citation share (the AI share-of-voice tracker is built for exactly this) and connect it to traffic with The Great Decoupling. Then your robots.txt becomes a tested decision you can defend, not a guess.
FAQ
What happens if I block AI crawlers?
You stop your content feeding those models — and you also become much less likely to be cited in their answers, which increasingly drive the surviving clicks.
Does blocking Google-Extended remove me from Google Search?
No. Google-Extended only governs Gemini and AI features. Googlebot handles normal Search indexing separately, so you stay indexed.
Is there a measurable upside to being in AI answers?
Yes — cited brands see roughly 35% higher CTR than uncited ones, and AI-referred visitors tend to convert better because they arrive informed.
Can I allow some crawlers and block others?
Yes. robots.txt rules are per user-agent, so you can allow answer bots that drive visibility while blocking pure training scrapers.
See Whether AI Answers Cite You
Before you touch robots.txt, get the facts. The AI Rank API and Google SERP API show exactly who AI engines cite for your queries.
Get Your Free API KeyExplore: AI Rank API · Google SERP API · Playground · Docs



