PureDevTools

robots.txt Generator

Create robots.txt rules for any crawler — allow, disallow, crawl-delay, sitemap, and AI crawler blocks

All processing happens in your browser. No data is sent to any server.

Build your robots.txt by adding user-agent groups and path rules

Presets

Apply a preset to quickly populate the config — existing rules will be replaced

#1

Rules

Sitemap URLs

One absolute URL per line — added as Sitemap: directives at the end

Generated robots.txt
# Generated by PureDevTools robots.txt Generator
# https://puredevtools.tools/robots-txt-generator

User-agent: *
Allow: /
Disallow: /admin/

Sitemap: https://example.com/sitemap.xml

You want to block AI crawlers (GPTBot, CCBot, Google-Extended) from scraping your content, allow Googlebot to index everything except /admin/ and /api/, set a crawl delay for aggressive bots, and include your sitemap URL. Writing robots.txt by hand means knowing the exact user-agent strings, the Allow/Disallow precedence rules, and the Sitemap directive syntax.

Why This Generator (Not a Text Editor)

Getting robots.txt syntax wrong means either blocking search engines from your entire site or leaving it wide open to scrapers. This tool provides a visual interface — add user-agent groups, toggle Allow/Disallow paths, set crawl-delay, add sitemap URLs, and one-click block common AI crawlers. No syntax errors possible. Everything runs in your browser.

What Is robots.txt?

robots.txt is a plain-text file placed at the root of your website — always at https://yourdomain.com/robots.txt. It uses the Robots Exclusion Protocol to tell web crawlers which pages or directories they are allowed or not allowed to access.

The protocol is widely respected by legitimate crawlers (Googlebot, Bingbot, AhrefsBot, GPTBot) but is advisory, not enforced. Malicious bots may ignore it. For sensitive content, use server-side access control instead.

robots.txt Syntax Reference

A robots.txt file consists of one or more groups. Each group targets one or more user agents and lists their path rules:

User-agent: *
Allow: /
Disallow: /admin/

User-agent: GPTBot
Disallow: /

Sitemap: https://example.com/sitemap.xml

Directives

DirectiveDescription
User-agent:The crawler this group applies to. * matches all crawlers.
Allow:Explicitly permit access to a path, even if a broader Disallow covers it.
Disallow:Prevent crawlers from accessing a path. Empty value = disallow nothing.
Crawl-delay:Seconds to wait between requests (ignored by Googlebot).
Sitemap:Absolute URL to your XML sitemap. Helps crawlers discover all pages.

Allow vs Disallow: Specificity Rules

When both Allow and Disallow rules match a URL, the most specific rule wins (longest matching path). In case of a tie, Allow takes precedence.

User-agent: *
Disallow: /private/
Allow: /private/public.html

This blocks the entire /private/ directory but explicitly allows /private/public.html. More specific rules always override broader ones.

The Wildcard User-agent

User-agent: * matches every crawler that does not have its own dedicated group. Place it first or last — order between groups does not matter. Only one group per user-agent applies (the most specific one that matches).

# Allow everything for most crawlers
User-agent: *
Disallow:

# Block AI training crawlers specifically
User-agent: GPTBot
Disallow: /

Blocking AI Crawlers

As of 2024, major AI companies have introduced dedicated crawler user-agent strings that you can block via robots.txt. This controls whether your content is used to train large language models.

CrawlerCompanyUser-agent
GPTBotOpenAIGPTBot
CCBotCommon CrawlCCBot
Google-ExtendedGoogle (Gemini)Google-Extended
anthropic-aiAnthropicanthropic-ai
Claude-WebAnthropicClaude-Web
BytespiderByteDance/TikTokBytespider
PerplexityBotPerplexityPerplexityBot

Use the Block AI Crawlers preset in this tool to generate all of these rules at once.

Important: Blocking GPTBot prevents OpenAI’s crawler from fetching new content going forward. It does not retroactively remove content already used in training.

Disallow Everything vs Allow Everything

Allow everything (default for most sites):

User-agent: *
Disallow:

An empty Disallow: value means “disallow nothing” — all paths are accessible.

Block everything (staging sites, private intranets):

User-agent: *
Disallow: /

Disallow: / blocks the root and all paths under it. Useful for preventing indexing of development environments.

Crawl-delay

Crawl-delay sets the minimum number of seconds between requests from a specific crawler:

User-agent: AhrefsBot
Crawl-delay: 10

Gotcha: Google ignores Crawl-delay. To reduce Googlebot’s crawl rate, use Google Search Console → Settings → Crawl rate. For other bots, Crawl-delay is the only standard way to throttle.

Sitemap Directive

The Sitemap: directive tells crawlers where to find your XML sitemap. It must be an absolute URL and can appear anywhere in the file (not inside a user-agent group):

Sitemap: https://example.com/sitemap.xml
Sitemap: https://example.com/sitemap-images.xml

Including your sitemap URL in robots.txt is one of the most reliable ways to ensure it gets discovered and submitted to all major search engines automatically.

What robots.txt Cannot Do

robots.txt vs Meta Robots Tag

MethodScopeControls
robots.txtFile-level (entire URL paths)Crawling (can the bot fetch this?)
<meta name="robots">Per-pageIndexing (should this page appear in search?)
X-Robots-Tag headerPer-resource (including non-HTML)Indexing (same as meta but via HTTP header)

For most sites, robots.txt handles crawl efficiency while meta robots handles indexing decisions. Use both for complete control.

Testing Your robots.txt

After deploying, verify your file at https://yourdomain.com/robots.txt. Google Search Console provides a robots.txt Tester tool that shows which rules affect specific URLs and whether Googlebot would be blocked.

Common mistakes to check:

Related Tools

More SEO & Web Tools