Which AI Bots Are You Actually Blocking? (GPTBot, ClaudeBot, Perplexity & More)
- June 4, 2026
- AI Search
"Block AI bots" sounds like one switch. It is not. Every AI company runs several crawlers with different jobs, and blocking the wrong one is how sites accidentally disappear from ChatGPT, Claude and Perplexity answers while believing they only opted out of model training. If you are going to make a decision about AI crawlers — and you should — make it the right one.
Every AI bot has one of three jobs
- Training — collects content that may train future models. Blocking it affects whether you contribute to the model, not whether you show up in answers today.
- Search / indexing — builds the live index the assistant retrieves from. Block this and you vanish from that product's cited answers.
- User-fetch — grabs a specific URL when a human asks the assistant to read it. Block this and the assistant cannot open your page on request.
That distinction is the whole game. Here is who's who.
The 2026 AI crawler field guide
| Bot (user agent) | Operator | Job | What blocking it does |
|---|---|---|---|
GPTBot | OpenAI | Training | Excludes you from training data. Does not remove you from ChatGPT search. |
OAI-SearchBot | OpenAI | Search index | Removes you from ChatGPT's search results and citations. |
ChatGPT-User | OpenAI | User-fetch | ChatGPT can't open your page when a user asks it to. |
ClaudeBot | Anthropic | Training | Excludes you from Claude training data only. |
Claude-SearchBot | Anthropic | Search index | Removes you from Claude's search-backed answers. |
Claude-User | Anthropic | User-fetch | Claude can't fetch your URL on a user's request. |
PerplexityBot | Perplexity | Search index | Removes you from Perplexity's indexed answers. |
Perplexity-User | Perplexity | User-fetch | Live fetches for user questions (historically does not honor robots.txt — control it at the firewall). |
Googlebot | Search index | Removes you from Google Search and AI Overviews (which use the search index). Renders JavaScript. | |
Google-Extended | Training token | A robots.txt token, not a crawler. Opts you out of Gemini/Vertex training. Does not affect Search or AI Overviews. | |
Bingbot | Microsoft | Search index | Removes you from Bing and Microsoft Copilot. |
CCBot | Common Crawl | Training (open dataset) | Excludes you from the open dataset many AI trainers use. |
Sources: OpenAI's crawler docs and Anthropic's documented agents.
The mistake almost everyone makes
Someone reads a scary headline about AI scraping, drops a blanket Disallow for "AI bots," or a security plugin/WAF starts 403'ing anything that looks like a bot. The result: you block OAI-SearchBot, Claude-SearchBot and PerplexityBot too — the exact crawlers that decide whether you get cited — while thinking you only said no to training. (For real, reproducible examples of big sites doing this, see The Forgotten HTML.)
The two robots.txt recipes that actually make sense
1. "I want AI citations, just not to train the models." Allow the search and user bots; block the training ones:
User-agent: GPTBot Disallow: / User-agent: ClaudeBot Disallow: / User-agent: CCBot Disallow: / User-agent: Google-Extended Disallow: / # leave these ALLOWED (default): # OAI-SearchBot, ChatGPT-User, Claude-SearchBot, Claude-User, PerplexityBot, Googlebot, Bingbot
2. "Block everything AI." Disallow all of the above — but understand you are opting out of AI-answer visibility entirely, and accept the traffic trade-off.
One caveat: robots.txt is a request, not a wall. Some user-fetch bots ignore it, and a firewall can override it in either direction. If it must be enforced, do it at the WAF/CDN.
How to check what you're actually blocking
Test each user agent directly — a 403 means you're blocking it, content means you're not:
for ua in GPTBot OAI-SearchBot ChatGPT-User ClaudeBot Claude-SearchBot PerplexityBot; do
echo -n "$ua: "; curl -s -o /dev/null -w "%{http_code}n" -A "$ua" https://yoursite.com/
doneThen read your robots.txt, and check your CDN/WAF bot-management rules — that is where accidental blocks usually live.
Not sure which bots can see (or are blocked from) your site?
An advanced technical audit checks rendering, retrievability, and bot access end to end. See how an advanced SEO audit works →
About SEO ProCheck
Technical SEO consulting and GEO strategy with 20 years of enterprise experience. Case studies, resources, and tools for search and AI visibility.
Work With Me
Technical SEO audits, GEO strategy, site migrations, and international SEO. Hourly consulting for teams who need hands-on support, not just reports.








