robots.txt Tester: Test Crawl Rules for Any Bot

Paste your robots.txt and instantly test whether a URL is allowed or blocked for any crawler: Googlebot, Bingbot, GPTBot, and more. No signup required.

robots.txt Content

URL to Test

User Agent

Other Coding Tools

robots.txt Tester - Validate Crawl Rules for Any Bot Instantly

Paste your robots.txt file, enter a URL, choose a crawler user agent, and find out in seconds whether that path would be crawled or blocked. The matching rule is highlighted directly in the editor so you always know exactly which directive made the decision.

What is a robots.txt file and why does it matter?

A robots.txt file is a plain-text file placed at the root of a website (e.g. https://example.com/robots.txt) that tells web crawlers which pages they may or may not visit. Search engines such as Google and Bing check this file before crawling your site. Correctly configuring robots.txt lets you protect sensitive paths, reduce server load from bots, and control which content gets indexed.

How do you test a robots.txt file without going live?

The easiest way is to paste the file contents into an online robots.txt tester like this one. Enter the URL path you want to evaluate, select the crawler (e.g. Googlebot), and click Test Crawl Access. The tool parses the file in your browser - no data is sent to a server - and shows you the outcome along with the specific rule that determined it, highlighted inline in the editor.

What is the correct robots.txt syntax and file format?

A robots.txt file consists of one or more groups. Each group starts with one or more User-agent: lines that name the crawlers the rules apply to, followed by Allow: and Disallow: directives listing permitted and blocked paths. Groups are separated by blank lines. Lines beginning with # are comments. The file must be UTF-8 encoded and served at the exact path /robots.txt with content type text/plain.

How do Allow and Disallow rules work?

Disallow: /path/ prevents a crawler from visiting that path and everything beneath it. Allow: /path/ grants access even within a disallowed parent directory. When multiple rules match a URL, the most specific one (longest matching path) takes effect. If two rules have equal length, Allow wins over Disallow.

How does user-agent matching work in robots.txt?

The User-agent: value is a case-insensitive product token. A rule group applies to a crawler if the token appears anywhere in the crawler's full user-agent string. For example, User-agent: Googlebot matches Googlebot/2.1 (+http://www.google.com/bot.html). Groups with explicit user-agent names always take precedence over the wildcard User-agent: *. Using What is my User Agent you can find your own exact user-agent string.

What does User-agent: * mean in robots.txt?

User-agent: * is the wildcard group - its rules apply to every crawler that does not have a dedicated group in the file. If you only have a wildcard group, all bots follow the same rules. If a bot like Googlebot has its own group, Googlebot follows only that group, completely ignoring the * group.

How do wildcard patterns work in robots.txt paths?

Paths support two special characters. An asterisk * matches any sequence of characters - for example, Disallow: /*.pdf$ blocks all URLs ending in .pdf. A dollar sign $ at the end of a path anchors the match to the end of the URL, so Disallow: /page$ blocks exactly /page but not /page/subpage. All other characters are treated literally.

Which search engine bots should I know about?

The most important crawlers to configure are Googlebot (Google Search), Bingbot (Microsoft Bing), Applebot (Apple Spotlight and Siri), and DuckDuckBot (DuckDuckGo). You can test any of these using the quick-select chips in this tool. Each chip fills in the correct user-agent token so you can verify your rules without looking up exact UA strings.

How do I block AI training crawlers like GPTBot with robots.txt?

To opt out of having your content used for AI training, add a dedicated group for the AI crawler. For example, User-agent: GPTBot followed by Disallow: / blocks OpenAI's GPTBot from the entire site. Other common AI crawlers include Claude-Web (Anthropic) and CCBot (Common Crawl). Use this tester to verify the full GPTBot user-agent string is correctly matched against your rules before deploying.

What is Crawl-delay in robots.txt and should I use it?

Crawl-delay: tells a crawler to wait a specified number of seconds between requests. It can reduce server load from aggressive bots. Note that Googlebot does not support Crawl-delay - use Google Search Console's crawl rate settings instead. Bingbot and some other crawlers do respect it. This directive is informational and is not parsed by this tester's allow/block logic.