AI Crawler — VibecodeAEO Glossary

AI crawlers split into two purposes: training crawlers (build the corpus used to train the next model version) and retrieval crawlers (fetch fresh content at query time for RAG-style responses). Perplexity, ChatGPT browsing, and Gemini all use retrieval crawlers; OpenAI and Anthropic also operate training crawlers.

Identify them by user agent and respect their rate limits. Block selectively if needed, but understand the trade-off: blocking GPTBot means future ChatGPT models will not have your content.

Real-world example

A brand checks their server logs after an AEO audit and discovers PerplexityBot has visited 0 pages in the last 30 days. Investigation reveals an overly broad robots.txt Disallow rule added during a replatform. Fixing it causes PerplexityBot traffic to spike 400% in 7 days — and citations follow within 2 weeks.

Frequently asked questions

How can I tell which AI crawlers are visiting my site?+

Check your server access logs filtered by user agent. Key user agents to look for: GPTBot, ChatGPT-User, ClaudeBot, PerplexityBot, Google-Extended, Bytespider. VibecodeAEO also audits your robots.txt to show you which crawlers are allowed vs. blocked and what impact each has on your citation visibility.

Should I block AI training crawlers to protect my content?+

This is a business decision. Blocking training crawlers prevents your content from entering AI model training corpora, which may reduce future citations in ChatGPT and Claude. Most B2B brands benefit more from being cited than from withholding content. Retrieval crawlers (Perplexity, Bing) should almost never be blocked.

Related terms

robots.txt — A plain-text file at /robots.txt that tells web crawlers (search engines and AI bots) which pages they may access. Mis-configuring it is the #1 way brands accidentally block themselves from AI engines.
llms.txt — A plain-text file at the root of a domain (like /robots.txt) that gives AI systems a curated, machine-readable summary of your brand, products, and key documentation links.
Answer Engine — An AI system that responds to natural-language queries with a synthesized answer rather than a list of links — examples include ChatGPT, Perplexity, Google Gemini, Microsoft Copilot, and Anthropic Claude.

Frequently asked questions

Related terms

Keep going

Audit your brand against this concept