A web crawler operated by an AI company to gather content for training or for live answer-engine retrieval. Distinct from traditional search crawlers like Googlebot.
AI crawlers split into two purposes: training crawlers (build the corpus used to train the next model version) and retrieval crawlers (fetch fresh content at query time for RAG-style responses). Perplexity, ChatGPT browsing, and Gemini all use retrieval crawlers; OpenAI and Anthropic also operate training crawlers.
Identify them by user agent and respect their rate limits. Block selectively if needed, but understand the trade-off: blocking GPTBot means future ChatGPT models will not have your content.
VibecodeAEO scans your site for all AEO factors weekly and tells you exactly what to fix.