AI Crawler Comparison: GPTBot, ClaudeBot, PerplexityBot Complete Guide

January 1, 2025
AI Search

1 Comment

AI companies deploy multiple crawlers for different purposes, each with different behaviors regarding robots.txt compliance, JavaScript rendering, and crawling patterns. Understanding these differences is essential for controlling AI access to your content.

AI Crawler Overview

Crawler	Company	Purpose	Robots.txt	Crawl-delay	JS Rendering
GPTBot	OpenAI	AI model training	Respects	No	No
ChatGPT-User	OpenAI	Real-time browsing	Respects	No	No
OAI-SearchBot	OpenAI	Search indexing	Respects	No	No
ClaudeBot	Anthropic	Training + retrieval	Respects	Yes	No
PerplexityBot	Perplexity	Indexing	Controversial	No	No
Google-Extended	Google	AI training control	Control token	N/A	N/A
Googlebot	Google	Search + AI features	Respects	No	Yes
AppleBot	Apple	Siri/Spotlight	Respects	No	Yes

Source: Vercel: The Rise of the AI Crawler

OpenAI Crawlers (GPTBot, ChatGPT-User, OAI-SearchBot)

According to OpenAI's documentation, OpenAI operates three distinct crawlers:

GPTBot

Purpose: Training data collection for AI models
User agent: GPTBot
Robots.txt: Respects directives
IP ranges: Published by OpenAI
Block this if: You don't want content used for AI training

ChatGPT-User

Purpose: Real-time browsing triggered by user queries
User agent: ChatGPT-User
Robots.txt: Respects directives
Block this if: You don't want to appear in ChatGPT's live responses

OAI-SearchBot

Purpose: SearchGPT indexing
User agent: OAI-SearchBot
Robots.txt: Respects directives
Block this if: You don't want to appear in SearchGPT results

Robots.txt Example (OpenAI)

# Block AI training, allow search features
User-agent: GPTBot
Disallow: /

User-agent: ChatGPT-User
Allow: /

User-agent: OAI-SearchBot
Allow: /

Anthropic Crawler (ClaudeBot)

According to Anthropic's documentation, ClaudeBot is unique among AI crawlers in supporting the Crawl-delay directive.

Purpose: Training data and real-time retrieval
User agent: ClaudeBot
Robots.txt: Respects directives
Crawl-delay: Supported (unique among major AI crawlers)
IP ranges: Not published (uses service provider IPs)
Legacy agents: Also honors ANTHROPIC-AI and CLAUDE-WEB

Robots.txt Example (Anthropic)

User-agent: ClaudeBot
Allow: /
Crawl-delay: 10

# Legacy support
User-agent: ANTHROPIC-AI
Allow: /

Note: 404 Media reported that iFixit saw ClaudeBot hit their site nearly 1 million times in 24 hours. Use Crawl-delay if you allow ClaudeBot.

Perplexity Crawler (PerplexityBot)

PerplexityBot has been controversial due to documented stealth crawling behavior.

Purpose: Supplementary indexing
User agent: PerplexityBot
Robots.txt: Controversial compliance
IP ranges: Not published

Documented Issues

Independent researchers have documented concerns about PerplexityBot:

Stealth crawling with modified user agents disguised as Chrome browsers
Ignoring robots.txt through undeclared crawlers
Rotating IPs and ASNs to evade blocks
Using headless browsers with generic Chrome user agents

Important: Even if you block PerplexityBot, Perplexity can still access your content through Google and Bing APIs.

Google AI Crawlers

Google-Extended (Not Actually a Crawler)

According to Google's documentation, Google-Extended is a robots.txt control token, not a crawler. It uses existing Googlebot infrastructure.

Purpose: Control AI training and Gemini grounding
What it blocks: AI model training, Gemini grounding
What it doesn't block: Search indexing, AI Overviews

User-agent: Google-Extended
Disallow: /

Googlebot

Standard Googlebot handles both traditional search and AI Overview features.

JavaScript rendering: Yes (one of few AI-related crawlers with JS support)
AI Overviews: Uses existing Googlebot index
Cannot be blocked: If you want search visibility

Crawl-to-Refer Ratios

According to Cloudflare's research, traditional crawlers send traffic back to sites. AI crawlers extract content with minimal referrals:

Platform	Ratio	Meaning
Googlebot (traditional)	3:1	1 referral per 3 crawls
OpenAI crawlers	3,700:1	Massive extraction, minimal traffic
Anthropic crawlers	25,000-100,000:1	Highest extraction ratio
Perplexity	200:1	Most favorable among AI platforms

Complete Robots.txt Template

# OpenAI Crawlers
User-agent: GPTBot
Disallow: /  # Block training

User-agent: ChatGPT-User
Allow: /  # Allow live browsing

User-agent: OAI-SearchBot
Allow: /  # Allow search indexing

# Anthropic Crawler
User-agent: ClaudeBot
Allow: /
Crawl-delay: 10

# Perplexity Crawler
User-agent: PerplexityBot
Disallow: /  # Block due to controversial behavior

# Google AI Training
User-agent: Google-Extended
Disallow: /  # Block AI training, keep search

Frequently Asked Questions

Can I block AI crawlers but keep search visibility?

Partially. You can block AI training crawlers (GPTBot, ClaudeBot, Google-Extended) while allowing search crawlers. But blocking all AI crawlers may reduce your visibility in AI-powered search features.

Why doesn't Crawl-delay work for most AI crawlers?

Only ClaudeBot supports Crawl-delay. Other AI crawlers (GPTBot, PerplexityBot) ignore this directive.

Should I block PerplexityBot?

Consider it due to documented compliance issues. However, Perplexity can still access your content through Google and Bing APIs even if you block the crawler.

Does blocking Google-Extended affect my search rankings?

No. Google-Extended only affects AI training and Gemini grounding. Regular search indexing and AI Overviews are unaffected.

Sources

Related Research

admin

AI Crawlers, ChatGPT, claude-ai, ClaudeBot, GEO, gptbot, Perplexity, PerplexityBot, Robots.txt, Technical SEO

About SEO ProCheck

Technical SEO consulting and GEO strategy with 20 years of enterprise experience. Case studies, resources, and tools for search and AI visibility.

Learn more about me

Work With Me

Technical SEO audits, GEO strategy, site migrations, and international SEO. Hourly consulting for teams who need hands-on support, not just reports.

Contact now

Subscribe to our newsletter!

More from our blog

Stop Tuning Keyword Density — Check If AI Crawlers Can Even See Your Page

Prev. Post

AI Crawler Comparison: GPTBot, ClaudeBot, PerplexityBot Complete Guide

AI Crawler Overview

OpenAI Crawlers (GPTBot, ChatGPT-User, OAI-SearchBot)

GPTBot

ChatGPT-User

OAI-SearchBot

Robots.txt Example (OpenAI)

Anthropic Crawler (ClaudeBot)

Robots.txt Example (Anthropic)

Perplexity Crawler (PerplexityBot)

Documented Issues

Google AI Crawlers

Google-Extended (Not Actually a Crawler)

Googlebot

Crawl-to-Refer Ratios

Complete Robots.txt Template

Frequently Asked Questions

Can I block AI crawlers but keep search visibility?

Why doesn't Crawl-delay work for most AI crawlers?

Should I block PerplexityBot?

Does blocking Google-Extended affect my search rankings?

Sources

Related Research

About SEO ProCheck

Work With Me

Subscribe to our newsletter!

More from our blog

The Forgotten HTML: What AI Crawlers Really See on Your Expensive Website

Missing Local Schema

No Local Reviews

Keyword Stuffing Detection

No Local Citations

Missing Internal Links

Google AI Overviews: How to Optimize for AI Overview Citations

Missing Heading Hierarchy

Recent Posts

AI Crawler Comparison: GPTBot, ClaudeBot, PerplexityBot Complete Guide

AI Crawler Overview

OpenAI Crawlers (GPTBot, ChatGPT-User, OAI-SearchBot)

GPTBot

ChatGPT-User

OAI-SearchBot

Robots.txt Example (OpenAI)

Anthropic Crawler (ClaudeBot)

Robots.txt Example (Anthropic)

Perplexity Crawler (PerplexityBot)

Documented Issues

Google AI Crawlers

Google-Extended (Not Actually a Crawler)

Googlebot

Crawl-to-Refer Ratios

Complete Robots.txt Template

Frequently Asked Questions

Can I block AI crawlers but keep search visibility?

Why doesn't Crawl-delay work for most AI crawlers?

Should I block PerplexityBot?

Does blocking Google-Extended affect my search rankings?

Sources

Related Research

About SEO ProCheck

Work With Me

Subscribe to our newsletter!

More from our blog

Recent Posts

All Website Tags