Robots.txt Tester - Check Crawler Access Rules

Test robots.txt rules to see if a specific user-agent is allowed or blocked from crawling a URL.

How to Use

  • Paste your robots.txt content into the first text area.
  • Enter the user-agent name you want to test (e.g., 'Googlebot', 'Bingbot', '*' for all).
  • Enter the full URL you want to test for crawler access.
  • Click 'Test' to see if the specified crawler is allowed or blocked from the URL.
  • The result shows the access decision (Allowed/Disallowed) and the specific rule that matched.

About Robots.txt

How Robots.txt Works

The robots.txt file is a standard used by websites to communicate with web crawlers about which pages should not be crawled or indexed. It must be placed in the root directory of your website (e.g., https://example.com/robots.txt). Each section starts with 'User-agent:' followed by rules. The wildcard '*' applies to all crawlers. Crawlers check this file before crawling any page on your site.

Allow vs Disallow Rules

Disallow rules block crawlers from accessing matching paths. Allow rules explicitly permit access, overriding more general Disallow rules. When multiple rules match a URL, the most specific rule (longest match) wins. For example, 'Disallow: /private/' blocks all paths starting with /private/, but 'Allow: /private/public.html' would allow that specific page even if the parent directory is disallowed.

Common Robots.txt Patterns

Common robots.txt patterns include: blocking all crawlers ('User-agent: * / Disallow: /'), blocking only Googlebot ('User-agent: Googlebot / Disallow: /'), blocking specific directories ('Disallow: /admin/'), blocking URL parameters ('Disallow: /*?*'), and including a Sitemap directive ('Sitemap: https://example.com/sitemap.xml'). The Sitemap directive helps search engines find your sitemap automatically.

Robots.txt Limitations

Robots.txt prevents crawling but does NOT prevent indexing. A page can be indexed even if blocked in robots.txt if other pages link to it. To prevent indexing, use the 'noindex' meta tag or X-Robots-Tag HTTP header instead. Also note that robots.txt is a voluntary standard — malicious bots may ignore it. Never use robots.txt to hide sensitive content; use proper authentication instead.

Key Features

  • Parses robots.txt rules for any user-agent including wildcards
  • Tests specific URL paths against Allow and Disallow directives
  • Shows the matched rule that determined the access decision
  • Handles rule precedence: most specific (longest) rule wins

Common Applications

  • Verifying that Googlebot can access important pages after updating robots.txt
  • Debugging why certain pages are not being indexed by search engines
  • Testing robots.txt rules before deploying changes to production
  • Confirming that sensitive directories are properly blocked from all crawlers