AI Crawler

A web crawler operated by an AI company to gather content for training or for live answer-engine retrieval. Distinct from traditional search crawlers like Googlebot.

AI crawlers split into two purposes: training crawlers (build the corpus used to train the next model version) and retrieval crawlers (fetch fresh content at query time for RAG-style responses). Perplexity, ChatGPT browsing, and Gemini all use retrieval crawlers; OpenAI and Anthropic also operate training crawlers.

Identify them by user agent and respect their rate limits. Block selectively if needed, but understand the trade-off: blocking GPTBot means future ChatGPT models will not have your content.

Related terms

Audit your brand against this concept

VibecodeAEO scans your site for all AEO factors weekly and tells you exactly what to fix.