Which AI Bots Are You Actually Blocking? (GPTBot, ClaudeBot, Perplexity & More)

June 4, 2026
AI Search

No Comments

"Block AI bots" sounds like one switch. It is not. Every AI company runs several crawlers with different jobs, and blocking the wrong one is how sites accidentally disappear from ChatGPT, Claude and Perplexity answers while believing they only opted out of model training. If you are going to make a decision about AI crawlers — and you should — make it the right one.

Three-column diagram of ai bot jobs: train (gptbot, claudebot... ), search/index (oai-searchbot, perplexitybot... ), user-fetch (chatgpt-user... ), with the blocking consequence of each — The three jobs an AI bot can have, and what blocking each one costs you.

Every AI bot has one of three jobs

Training — collects content that may train future models. Blocking it affects whether you contribute to the model, not whether you show up in answers today.
Search / indexing — builds the live index the assistant retrieves from. Block this and you vanish from that product's cited answers.
User-fetch — grabs a specific URL when a human asks the assistant to read it. Block this and the assistant cannot open your page on request.

That distinction is the whole game. Here is who's who.

The 2026 AI crawler field guide

Bot (user agent)	Operator	Job	What blocking it does
`GPTBot`	OpenAI	Training	Excludes you from training data. Does not remove you from ChatGPT search.
`OAI-SearchBot`	OpenAI	Search index	Removes you from ChatGPT's search results and citations.
`ChatGPT-User`	OpenAI	User-fetch	ChatGPT can't open your page when a user asks it to.
`ClaudeBot`	Anthropic	Training	Excludes you from Claude training data only.
`Claude-SearchBot`	Anthropic	Search index	Removes you from Claude's search-backed answers.
`Claude-User`	Anthropic	User-fetch	Claude can't fetch your URL on a user's request.
`PerplexityBot`	Perplexity	Search index	Removes you from Perplexity's indexed answers.
`Perplexity-User`	Perplexity	User-fetch	Live fetches for user questions (historically does not honor robots.txt — control it at the firewall).
`Googlebot`	Google	Search index	Removes you from Google Search and AI Overviews (which use the search index). Renders JavaScript.
`Google-Extended`	Google	Training token	A robots.txt token, not a crawler. Opts you out of Gemini/Vertex training. Does not affect Search or AI Overviews.
`Bingbot`	Microsoft	Search index	Removes you from Bing and Microsoft Copilot.
`CCBot`	Common Crawl	Training (open dataset)	Excludes you from the open dataset many AI trainers use.

Sources: OpenAI's crawler docs and Anthropic's documented agents.

The mistake almost everyone makes

Someone reads a scary headline about AI scraping, drops a blanket Disallow for "AI bots," or a security plugin/WAF starts 403'ing anything that looks like a bot. The result: you block OAI-SearchBot, Claude-SearchBot and PerplexityBot too — the exact crawlers that decide whether you get cited — while thinking you only said no to training. (For real, reproducible examples of big sites doing this, see The Forgotten HTML.)

The two robots.txt recipes that actually make sense

1. "I want AI citations, just not to train the models." Allow the search and user bots; block the training ones:

User-agent: GPTBot
Disallow: /
User-agent: ClaudeBot
Disallow: /
User-agent: CCBot
Disallow: /
User-agent: Google-Extended
Disallow: /

# leave these ALLOWED (default):
# OAI-SearchBot, ChatGPT-User, Claude-SearchBot, Claude-User, PerplexityBot, Googlebot, Bingbot

2. "Block everything AI." Disallow all of the above — but understand you are opting out of AI-answer visibility entirely, and accept the traffic trade-off.

One caveat: robots.txt is a request, not a wall. Some user-fetch bots ignore it, and a firewall can override it in either direction. If it must be enforced, do it at the WAF/CDN.

How to check what you're actually blocking

Test each user agent directly — a 403 means you're blocking it, content means you're not:

for ua in GPTBot OAI-SearchBot ChatGPT-User ClaudeBot Claude-SearchBot PerplexityBot; do
 echo -n "$ua: "; curl -s -o /dev/null -w "%{http_code}n" -A "$ua" https://yoursite.com/
done

Then read your robots.txt, and check your CDN/WAF bot-management rules — that is where accidental blocks usually live.

📡 The AI Crawler Visibility series

Not sure which bots can see (or are blocked from) your site?

An advanced technical audit checks rendering, retrievability, and bot access end to end. See how an advanced SEO audit works →

Claude Vincent

Claude Vincent is a technical SEO consultant focused on crawlability, rendering, and AI-search visibility. He writes the field guides and case studies at SEO ProCheck, with a bias toward the durable, unglamorous work that decides whether search engines and AI answer engines can actually read and cite a site.

About SEO ProCheck

Technical SEO consulting and GEO strategy with 20 years of enterprise experience. Case studies, resources, and tools for search and AI visibility.

Learn more about me

Work With Me

Technical SEO audits, GEO strategy, site migrations, and international SEO. Hourly consulting for teams who need hands-on support, not just reports.

Contact now

Subscribe to our newsletter!

More from our blog

Diagram of the agent-readable file stack showing AGENTS.md in the code repository read by coding agents, llms.txt and llms-full.txt at the website root read by answer engines, and robots.txt plus RSL as the access and licensing layer beneath both.

Prev. Post

Which AI Bots Are You Actually Blocking? (GPTBot, ClaudeBot, Perplexity & More)

Every AI bot has one of three jobs

The 2026 AI crawler field guide

The mistake almost everyone makes

The two robots.txt recipes that actually make sense

How to check what you're actually blocking

Not sure which bots can see (or are blocked from) your site?

About SEO ProCheck

Work With Me

Subscribe to our newsletter!

More from our blog

AGENTS.md vs llms.txt vs llms-full.txt: Which Agent File Does What

Profound vs Semrush and Ahrefs: What an AI-Search Tool Actually Replaces (and What It Doesn't)

SEO vs AEO vs GEO: What Each One Means and How They Actually Differ

Google May 2026 Core Update: What We Learned After the Dust Settled

Pogosticking: The Click Pattern That Quietly Decides Who Ranks

Interaction to Next Paint (INP): The Complete Guide

SSR vs CSR: Why Rendering Decides Whether AI Can Read Your Site

The Forgotten HTML: What AI Crawlers Really See on Your Expensive Website

Recent Posts

Which AI Bots Are You Actually Blocking? (GPTBot, ClaudeBot, Perplexity & More)

Every AI bot has one of three jobs

The 2026 AI crawler field guide

The mistake almost everyone makes

The two robots.txt recipes that actually make sense

How to check what you're actually blocking

Not sure which bots can see (or are blocked from) your site?

About SEO ProCheck

Work With Me

Subscribe to our newsletter!

More from our blog

Recent Posts

All Website Tags