robots.txt Generator
Create robots.txt rules for any crawler — allow, disallow, crawl-delay, sitemap, and AI crawler blocks
Build your robots.txt by adding user-agent groups and path rules
Presets
Apply a preset to quickly populate the config — existing rules will be replaced
Rules
Sitemap URLs
One absolute URL per line — added as Sitemap: directives at the end
robots.txt# Generated by PureDevTools robots.txt Generator # https://puredevtools.tools/robots-txt-generator User-agent: * Allow: / Disallow: /admin/ Sitemap: https://example.com/sitemap.xml
You want to block AI crawlers (GPTBot, CCBot, Google-Extended) from scraping your content, allow Googlebot to index everything except /admin/ and /api/, set a crawl delay for aggressive bots, and include your sitemap URL. Writing robots.txt by hand means knowing the exact user-agent strings, the Allow/Disallow precedence rules, and the Sitemap directive syntax.
Why This Generator (Not a Text Editor)
Getting robots.txt syntax wrong means either blocking search engines from your entire site or leaving it wide open to scrapers. This tool provides a visual interface — add user-agent groups, toggle Allow/Disallow paths, set crawl-delay, add sitemap URLs, and one-click block common AI crawlers. No syntax errors possible. Everything runs in your browser.
What Is robots.txt?
robots.txt is a plain-text file placed at the root of your website — always at https://yourdomain.com/robots.txt. It uses the Robots Exclusion Protocol to tell web crawlers which pages or directories they are allowed or not allowed to access.
The protocol is widely respected by legitimate crawlers (Googlebot, Bingbot, AhrefsBot, GPTBot) but is advisory, not enforced. Malicious bots may ignore it. For sensitive content, use server-side access control instead.
robots.txt Syntax Reference
A robots.txt file consists of one or more groups. Each group targets one or more user agents and lists their path rules:
User-agent: *
Allow: /
Disallow: /admin/
User-agent: GPTBot
Disallow: /
Sitemap: https://example.com/sitemap.xml
Directives
| Directive | Description |
|---|---|
User-agent: | The crawler this group applies to. * matches all crawlers. |
Allow: | Explicitly permit access to a path, even if a broader Disallow covers it. |
Disallow: | Prevent crawlers from accessing a path. Empty value = disallow nothing. |
Crawl-delay: | Seconds to wait between requests (ignored by Googlebot). |
Sitemap: | Absolute URL to your XML sitemap. Helps crawlers discover all pages. |
Allow vs Disallow: Specificity Rules
When both Allow and Disallow rules match a URL, the most specific rule wins (longest matching path). In case of a tie, Allow takes precedence.
User-agent: *
Disallow: /private/
Allow: /private/public.html
This blocks the entire /private/ directory but explicitly allows /private/public.html. More specific rules always override broader ones.
The Wildcard User-agent
User-agent: * matches every crawler that does not have its own dedicated group. Place it first or last — order between groups does not matter. Only one group per user-agent applies (the most specific one that matches).
# Allow everything for most crawlers
User-agent: *
Disallow:
# Block AI training crawlers specifically
User-agent: GPTBot
Disallow: /
Blocking AI Crawlers
As of 2024, major AI companies have introduced dedicated crawler user-agent strings that you can block via robots.txt. This controls whether your content is used to train large language models.
| Crawler | Company | User-agent |
|---|---|---|
| GPTBot | OpenAI | GPTBot |
| CCBot | Common Crawl | CCBot |
| Google-Extended | Google (Gemini) | Google-Extended |
| anthropic-ai | Anthropic | anthropic-ai |
| Claude-Web | Anthropic | Claude-Web |
| Bytespider | ByteDance/TikTok | Bytespider |
| PerplexityBot | Perplexity | PerplexityBot |
Use the Block AI Crawlers preset in this tool to generate all of these rules at once.
Important: Blocking GPTBot prevents OpenAI’s crawler from fetching new content going forward. It does not retroactively remove content already used in training.
Disallow Everything vs Allow Everything
Allow everything (default for most sites):
User-agent: *
Disallow:
An empty Disallow: value means “disallow nothing” — all paths are accessible.
Block everything (staging sites, private intranets):
User-agent: *
Disallow: /
Disallow: / blocks the root and all paths under it. Useful for preventing indexing of development environments.
Crawl-delay
Crawl-delay sets the minimum number of seconds between requests from a specific crawler:
User-agent: AhrefsBot
Crawl-delay: 10
Gotcha: Google ignores Crawl-delay. To reduce Googlebot’s crawl rate, use Google Search Console → Settings → Crawl rate. For other bots, Crawl-delay is the only standard way to throttle.
Sitemap Directive
The Sitemap: directive tells crawlers where to find your XML sitemap. It must be an absolute URL and can appear anywhere in the file (not inside a user-agent group):
Sitemap: https://example.com/sitemap.xml
Sitemap: https://example.com/sitemap-images.xml
Including your sitemap URL in robots.txt is one of the most reliable ways to ensure it gets discovered and submitted to all major search engines automatically.
What robots.txt Cannot Do
- Prevent indexing: Blocking a URL in robots.txt prevents crawling but not indexing. Google can still index a URL it cannot crawl if it finds links pointing to it. Use
noindexmeta tags orX-Robots-TagHTTP headers to prevent indexing. - Protect sensitive data: Any URL in your robots.txt is publicly visible, including your
Disallowpaths. Never rely on robots.txt as a security measure. - Stop malicious bots: Bad actors ignore robots.txt. Use IP blocking, rate limiting, and WAF rules for actual security.
robots.txt vs Meta Robots Tag
| Method | Scope | Controls |
|---|---|---|
robots.txt | File-level (entire URL paths) | Crawling (can the bot fetch this?) |
<meta name="robots"> | Per-page | Indexing (should this page appear in search?) |
X-Robots-Tag header | Per-resource (including non-HTML) | Indexing (same as meta but via HTTP header) |
For most sites, robots.txt handles crawl efficiency while meta robots handles indexing decisions. Use both for complete control.
Testing Your robots.txt
After deploying, verify your file at https://yourdomain.com/robots.txt. Google Search Console provides a robots.txt Tester tool that shows which rules affect specific URLs and whether Googlebot would be blocked.
Common mistakes to check:
- Accidentally disallowing CSS or JavaScript (prevents proper rendering)
- Wrong path format (must start with
/) - Blocking your sitemap location itself
- Case-sensitivity:
/Admin/and/admin/are treated as different paths