AI Crawler Comparison: GPTBot, ClaudeBot, PerplexityBot Complete Guide
- January 1, 2025
- AI Search
AI companies deploy multiple crawlers for different purposes, each with different behaviors regarding robots.txt compliance, JavaScript rendering, and crawling patterns. Understanding these differences is essential for controlling AI access to your content.
AI Crawler Overview
| Crawler | Company | Purpose | Robots.txt | Crawl-delay | JS Rendering |
|---|---|---|---|---|---|
| GPTBot | OpenAI | AI model training | Respects | No | No |
| ChatGPT-User | OpenAI | Real-time browsing | Respects | No | No |
| OAI-SearchBot | OpenAI | Search indexing | Respects | No | No |
| ClaudeBot | Anthropic | Training + retrieval | Respects | Yes | No |
| PerplexityBot | Perplexity | Indexing | Controversial | No | No |
| Google-Extended | AI training control | Control token | N/A | N/A | |
| Googlebot | Search + AI features | Respects | No | Yes | |
| AppleBot | Apple | Siri/Spotlight | Respects | No | Yes |
Source: Vercel: The Rise of the AI Crawler
OpenAI Crawlers (GPTBot, ChatGPT-User, OAI-SearchBot)
According to OpenAI's documentation, OpenAI operates three distinct crawlers:
GPTBot
- Purpose: Training data collection for AI models
- User agent: GPTBot
- Robots.txt: Respects directives
- IP ranges: Published by OpenAI
- Block this if: You don't want content used for AI training
ChatGPT-User
- Purpose: Real-time browsing triggered by user queries
- User agent: ChatGPT-User
- Robots.txt: Respects directives
- Block this if: You don't want to appear in ChatGPT's live responses
OAI-SearchBot
- Purpose: SearchGPT indexing
- User agent: OAI-SearchBot
- Robots.txt: Respects directives
- Block this if: You don't want to appear in SearchGPT results
Robots.txt Example (OpenAI)
# Block AI training, allow search features User-agent: GPTBot Disallow: / User-agent: ChatGPT-User Allow: / User-agent: OAI-SearchBot Allow: /
Anthropic Crawler (ClaudeBot)
According to Anthropic's documentation, ClaudeBot is unique among AI crawlers in supporting the Crawl-delay directive.
- Purpose: Training data and real-time retrieval
- User agent: ClaudeBot
- Robots.txt: Respects directives
- Crawl-delay: Supported (unique among major AI crawlers)
- IP ranges: Not published (uses service provider IPs)
- Legacy agents: Also honors ANTHROPIC-AI and CLAUDE-WEB
Robots.txt Example (Anthropic)
User-agent: ClaudeBot Allow: / Crawl-delay: 10 # Legacy support User-agent: ANTHROPIC-AI Allow: /
Note: 404 Media reported that iFixit saw ClaudeBot hit their site nearly 1 million times in 24 hours. Use Crawl-delay if you allow ClaudeBot.
Perplexity Crawler (PerplexityBot)
PerplexityBot has been controversial due to documented stealth crawling behavior.
- Purpose: Supplementary indexing
- User agent: PerplexityBot
- Robots.txt: Controversial compliance
- IP ranges: Not published
Documented Issues
Independent researchers have documented concerns about PerplexityBot:
- Stealth crawling with modified user agents disguised as Chrome browsers
- Ignoring robots.txt through undeclared crawlers
- Rotating IPs and ASNs to evade blocks
- Using headless browsers with generic Chrome user agents
Important: Even if you block PerplexityBot, Perplexity can still access your content through Google and Bing APIs.
Google AI Crawlers
Google-Extended (Not Actually a Crawler)
According to Google's documentation, Google-Extended is a robots.txt control token, not a crawler. It uses existing Googlebot infrastructure.
- Purpose: Control AI training and Gemini grounding
- What it blocks: AI model training, Gemini grounding
- What it doesn't block: Search indexing, AI Overviews
User-agent: Google-Extended Disallow: /
Googlebot
Standard Googlebot handles both traditional search and AI Overview features.
- JavaScript rendering: Yes (one of few AI-related crawlers with JS support)
- AI Overviews: Uses existing Googlebot index
- Cannot be blocked: If you want search visibility
Crawl-to-Refer Ratios
According to Cloudflare's research, traditional crawlers send traffic back to sites. AI crawlers extract content with minimal referrals:
| Platform | Ratio | Meaning |
|---|---|---|
| Googlebot (traditional) | 3:1 | 1 referral per 3 crawls |
| OpenAI crawlers | 3,700:1 | Massive extraction, minimal traffic |
| Anthropic crawlers | 25,000-100,000:1 | Highest extraction ratio |
| Perplexity | 200:1 | Most favorable among AI platforms |
Complete Robots.txt Template
# OpenAI Crawlers User-agent: GPTBot Disallow: / # Block training User-agent: ChatGPT-User Allow: / # Allow live browsing User-agent: OAI-SearchBot Allow: / # Allow search indexing # Anthropic Crawler User-agent: ClaudeBot Allow: / Crawl-delay: 10 # Perplexity Crawler User-agent: PerplexityBot Disallow: / # Block due to controversial behavior # Google AI Training User-agent: Google-Extended Disallow: / # Block AI training, keep search
Frequently Asked Questions
Can I block AI crawlers but keep search visibility?
Partially. You can block AI training crawlers (GPTBot, ClaudeBot, Google-Extended) while allowing search crawlers. But blocking all AI crawlers may reduce your visibility in AI-powered search features.
Why doesn't Crawl-delay work for most AI crawlers?
Only ClaudeBot supports Crawl-delay. Other AI crawlers (GPTBot, PerplexityBot) ignore this directive.
Should I block PerplexityBot?
Consider it due to documented compliance issues. However, Perplexity can still access your content through Google and Bing APIs even if you block the crawler.
Does blocking Google-Extended affect my search rankings?
No. Google-Extended only affects AI training and Gemini grounding. Regular search indexing and AI Overviews are unaffected.
Sources
- Vercel: The Rise of the AI Crawler
- OpenAI: Bot Documentation
- Anthropic: ClaudeBot Documentation
- Google: Common Crawlers Overview
- Cloudflare: The Crawl-to-Click Gap
- 404 Media: ClaudeBot Crawling Behavior
Related Research
About SEO ProCheck
Technical SEO consulting and GEO strategy with 20 years of enterprise experience. Case studies, resources, and tools for search and AI visibility.
Work With Me
Technical SEO audits, GEO strategy, site migrations, and international SEO. Hourly consulting for teams who need hands-on support, not just reports.
Subscribe to our newsletter!
Recent Posts
- No Social Schema December 7, 2025
- Missing Social Profile Links December 7, 2025
- Social Image Wrong Size December 7, 2025
