Question 1

What is a robots.txt file?

Accepted Answer

robots.txt is a plain-text file placed at the root of your website (e.g. https://example.com/robots.txt) that tells web crawlers which pages or paths they are allowed or not allowed to access. It follows the Robots Exclusion Protocol, a widely adopted but non-enforced standard — well-behaved crawlers respect it, but malicious bots may ignore it.

Question 2

How do I block AI crawlers like GPTBot from my site?

Accepted Answer

Add a separate User-agent block for each AI crawler with Disallow: /. For example: User-agent: GPTBot
Disallow: /

User-agent: CCBot
Disallow: /. Use the 'Block AI Crawlers' preset in this generator to create rules for GPTBot, CCBot, Google-Extended, anthropic-ai, Claude-Web, Bytespider, PerplexityBot, and more in one click.

Question 3

What is the difference between Allow and Disallow in robots.txt?

Accepted Answer

Disallow: /path tells crawlers not to access that path. Allow: /path explicitly permits access to a path, even if a broader Disallow rule covers it. For example, Disallow: /private/ with Allow: /private/public.html blocks the entire /private/ directory except public.html. More specific rules take precedence.

Question 4

What does 'Disallow:' with an empty value mean?

Accepted Answer

An empty Disallow directive (Disallow:) means 'disallow nothing' — the crawler is allowed to access the entire site. This is equivalent to having no Disallow rule at all. It's commonly used to explicitly signal that all content is open for crawling.

Question 5

Does robots.txt prevent pages from appearing in Google search results?

Accepted Answer

No. Blocking a URL in robots.txt prevents Google from crawling it, but Google can still index the URL if other pages link to it — it will just appear without a description snippet. To prevent indexing entirely, use a noindex meta tag or X-Robots-Tag HTTP header instead.

Question 6

What is Crawl-delay in robots.txt?

Accepted Answer

Crawl-delay tells a specific crawler how many seconds to wait between requests. For example, Crawl-delay: 10 means the bot should wait 10 seconds between page fetches. Note: Googlebot ignores Crawl-delay — use Google Search Console to manage Googlebot crawl rate instead.

Directive	Description
`User-agent:`	The crawler this group applies to. `*` matches all crawlers.
`Allow:`	Explicitly permit access to a path, even if a broader `Disallow` covers it.
`Disallow:`	Prevent crawlers from accessing a path. Empty value = disallow nothing.
`Crawl-delay:`	Seconds to wait between requests (ignored by Googlebot).
`Sitemap:`	Absolute URL to your XML sitemap. Helps crawlers discover all pages.

Crawler	Company	User-agent
GPTBot	OpenAI	`GPTBot`
CCBot	Common Crawl	`CCBot`
Google-Extended	Google (Gemini)	`Google-Extended`
anthropic-ai	Anthropic	`anthropic-ai`
Claude-Web	Anthropic	`Claude-Web`
Bytespider	ByteDance/TikTok	`Bytespider`
PerplexityBot	Perplexity	`PerplexityBot`

Method	Scope	Controls
`robots.txt`	File-level (entire URL paths)	Crawling (can the bot fetch this?)
`<meta name="robots">`	Per-page	Indexing (should this page appear in search?)
`X-Robots-Tag` header	Per-resource (including non-HTML)	Indexing (same as meta but via HTTP header)

robots.txt Generator

Why This Generator (Not a Text Editor)

What Is robots.txt?

robots.txt Syntax Reference

Directives

Allow vs Disallow: Specificity Rules

The Wildcard User-agent

Blocking AI Crawlers

Disallow Everything vs Allow Everything

Crawl-delay

Sitemap Directive

What robots.txt Cannot Do

robots.txt vs Meta Robots Tag

Testing Your robots.txt

Related Tools

More SEO & Web Tools