AI Crawler Comparison: GPTBot, ClaudeBot, PerplexityBot Complete Guide

1 Comment

AI companies deploy multiple crawlers for different purposes, each with different behaviors regarding robots.txt compliance, JavaScript rendering, and crawling patterns. Understanding these differences is essential for controlling AI access to your content.

AI Crawler Overview

CrawlerCompanyPurposeRobots.txtCrawl-delayJS Rendering
GPTBotOpenAIAI model trainingRespectsNoNo
ChatGPT-UserOpenAIReal-time browsingRespectsNoNo
OAI-SearchBotOpenAISearch indexingRespectsNoNo
ClaudeBotAnthropicTraining + retrievalRespectsYesNo
PerplexityBotPerplexityIndexingControversialNoNo
Google-ExtendedGoogleAI training controlControl tokenN/AN/A
GooglebotGoogleSearch + AI featuresRespectsNoYes
AppleBotAppleSiri/SpotlightRespectsNoYes

Source: Vercel: The Rise of the AI Crawler

OpenAI Crawlers (GPTBot, ChatGPT-User, OAI-SearchBot)

According to OpenAI's documentation, OpenAI operates three distinct crawlers:

GPTBot

  • Purpose: Training data collection for AI models
  • User agent: GPTBot
  • Robots.txt: Respects directives
  • IP ranges: Published by OpenAI
  • Block this if: You don't want content used for AI training

ChatGPT-User

  • Purpose: Real-time browsing triggered by user queries
  • User agent: ChatGPT-User
  • Robots.txt: Respects directives
  • Block this if: You don't want to appear in ChatGPT's live responses

OAI-SearchBot

  • Purpose: SearchGPT indexing
  • User agent: OAI-SearchBot
  • Robots.txt: Respects directives
  • Block this if: You don't want to appear in SearchGPT results

Robots.txt Example (OpenAI)

# Block AI training, allow search features
User-agent: GPTBot
Disallow: /

User-agent: ChatGPT-User
Allow: /

User-agent: OAI-SearchBot
Allow: /

Anthropic Crawler (ClaudeBot)

According to Anthropic's documentation, ClaudeBot is unique among AI crawlers in supporting the Crawl-delay directive.

  • Purpose: Training data and real-time retrieval
  • User agent: ClaudeBot
  • Robots.txt: Respects directives
  • Crawl-delay: Supported (unique among major AI crawlers)
  • IP ranges: Not published (uses service provider IPs)
  • Legacy agents: Also honors ANTHROPIC-AI and CLAUDE-WEB

Robots.txt Example (Anthropic)

User-agent: ClaudeBot
Allow: /
Crawl-delay: 10

# Legacy support
User-agent: ANTHROPIC-AI
Allow: /

Note: 404 Media reported that iFixit saw ClaudeBot hit their site nearly 1 million times in 24 hours. Use Crawl-delay if you allow ClaudeBot.

Perplexity Crawler (PerplexityBot)

PerplexityBot has been controversial due to documented stealth crawling behavior.

  • Purpose: Supplementary indexing
  • User agent: PerplexityBot
  • Robots.txt: Controversial compliance
  • IP ranges: Not published

Documented Issues

Independent researchers have documented concerns about PerplexityBot:

  • Stealth crawling with modified user agents disguised as Chrome browsers
  • Ignoring robots.txt through undeclared crawlers
  • Rotating IPs and ASNs to evade blocks
  • Using headless browsers with generic Chrome user agents

Important: Even if you block PerplexityBot, Perplexity can still access your content through Google and Bing APIs.

Google AI Crawlers

Google-Extended (Not Actually a Crawler)

According to Google's documentation, Google-Extended is a robots.txt control token, not a crawler. It uses existing Googlebot infrastructure.

  • Purpose: Control AI training and Gemini grounding
  • What it blocks: AI model training, Gemini grounding
  • What it doesn't block: Search indexing, AI Overviews
User-agent: Google-Extended
Disallow: /

Googlebot

Standard Googlebot handles both traditional search and AI Overview features.

  • JavaScript rendering: Yes (one of few AI-related crawlers with JS support)
  • AI Overviews: Uses existing Googlebot index
  • Cannot be blocked: If you want search visibility

Crawl-to-Refer Ratios

According to Cloudflare's research, traditional crawlers send traffic back to sites. AI crawlers extract content with minimal referrals:

PlatformRatioMeaning
Googlebot (traditional)3:11 referral per 3 crawls
OpenAI crawlers3,700:1Massive extraction, minimal traffic
Anthropic crawlers25,000-100,000:1Highest extraction ratio
Perplexity200:1Most favorable among AI platforms

Complete Robots.txt Template

# OpenAI Crawlers
User-agent: GPTBot
Disallow: /  # Block training

User-agent: ChatGPT-User
Allow: /  # Allow live browsing

User-agent: OAI-SearchBot
Allow: /  # Allow search indexing

# Anthropic Crawler
User-agent: ClaudeBot
Allow: /
Crawl-delay: 10

# Perplexity Crawler
User-agent: PerplexityBot
Disallow: /  # Block due to controversial behavior

# Google AI Training
User-agent: Google-Extended
Disallow: /  # Block AI training, keep search

Frequently Asked Questions

Can I block AI crawlers but keep search visibility?

Partially. You can block AI training crawlers (GPTBot, ClaudeBot, Google-Extended) while allowing search crawlers. But blocking all AI crawlers may reduce your visibility in AI-powered search features.

Why doesn't Crawl-delay work for most AI crawlers?

Only ClaudeBot supports Crawl-delay. Other AI crawlers (GPTBot, PerplexityBot) ignore this directive.

Should I block PerplexityBot?

Consider it due to documented compliance issues. However, Perplexity can still access your content through Google and Bing APIs even if you block the crawler.

Does blocking Google-Extended affect my search rankings?

No. Google-Extended only affects AI training and Gemini grounding. Regular search indexing and AI Overviews are unaffected.

Sources

Related Research

About SEO ProCheck

Technical SEO consulting and GEO strategy with 20 years of enterprise experience. Case studies, resources, and tools for search and AI visibility.

Work With Me

Technical SEO audits, GEO strategy, site migrations, and international SEO. Hourly consulting for teams who need hands-on support, not just reports.

Subscribe to our newsletter!

More from our blog