Robots.txt checker for SEO, crawlers, and sitemap discovery

Pull the live robots.txt file from any public website and review User-agent groups, Allow and Disallow path rules, optional Crawl-delay lines, and every Sitemap declaration in one place. Use it for technical SEO audits, pre-launch checklists, migration QA, and debugging accidental crawl blocks before rankings and discoverability suffer.

How to use this robots.txt analyzer

Paste a canonical homepage URL (with https://), a bare domain such as example.com, or a direct link to /robots.txt. Submit to fetch the file from our servers with redirect following and SSRF protections aligned with the rest of this toolkit. Start with the HTTP status and final URL: a 404 often means “no file,” which behaves differently from a 200 with restrictive rules. Then read parsed groups from top to bottom—each User-agent section applies to the bots named until the next User-agent line.

Cross-check indexing signals beyond the file: use the meta tags extractor for robots meta and viewport, the canonical tag checker for duplicate URL consolidation, and the HTTP header checker for X-Robots-Tag or cache headers that interact with crawling.

Guide: reading Allow, Disallow, and Sitemap lines

Modern crawlers follow RFC 9309-style matching: rules use URL path prefixes, Allow can carve exceptions out of a broad Disallow, and the most specific rule wins when both match. A lone User-agent: * block is the default policy for unspecified bots; vendor-specific blocks (for example Googlebot or Bingbot) override details only for those user agents. Sitemap lines advertise XML sitemap endpoints—prefer absolute https:// URLs so parsers do not have to guess the host.

Remember: robots.txt manages fetch permission, not guaranteed de-indexing. If a URL is disallowed but heavily linked, search engines may still show a snippet without fetching. Pair disallow rules with on-page or header-level noindex when you need stronger removal behavior, and keep marketing parameters under control with the redirect chain checker so crawlers reach stable URLs efficiently.

When to re-check robots.txt in your workflow

  • After site migrations, CMS changes, or enabling a CDN/WAF that might serve a different edge response.
  • Before and after staging merges—never copy staging disallow rules into production by accident.
  • When Search Console reports crawl anomalies on sections you expect indexed.
  • When you add or split sitemap files; list them in robots.txt and confirm they resolve with the response code checker.

Infrastructure checks that complement robots.txt

Crawlers need resolvable DNS, valid TLS, and consistent hostnames. Use the DNS lookup tool for A/AAAA/CNAME correctness, the SSL certificate checker before certificate expiry interrupts fetches, and the broken link checker after you change URL patterns referenced in Allow or Disallow rules.

Related free tools

Explore the full website and URL tools on the home page, or jump to a focused utility below.

  • Broken Link CheckerScan outbound links from any URL for 404s and broken hrefs—paste a page and audit links in seconds.
  • HTTP Header CheckerInspect HTTP response headers for any URL: cache control, content-type, CORS, and security-related values.
  • Redirect Chain CheckerTrace the full redirect path to the final URL and spot unnecessary hops hurting SEO and performance.
  • SSL Certificate CheckerVerify TLS certificate validity, expiry, issuer, and chain for any domain before users hit errors.
  • DNS Lookup ToolQuery A, AAAA, MX, CNAME, TXT, NS, and SOA records for troubleshooting email, hosting, and DNS.
  • WHOIS LookupLook up domain registration details: registrar, dates, and status for research and due diligence.
  • IP Address LookupResolve IPv4 or IPv6 to geolocation, ISP, ASN, and hostname for network and fraud analysis.
  • Domain Age CheckerSee how long a domain has been registered—useful for SEO trust signals and quick vetting.
  • Meta Tags ExtractorExtract title, meta description, Open Graph, Twitter Card, and canonical tags from any live URL.
  • Open Graph PreviewPreview how a link may appear when shared on social networks before you publish or pitch.

Frequently asked questions

What is robots.txt and why does it matter for SEO?
robots.txt is a plain-text file at the root of a site (for example https://example.com/robots.txt) that gives crawlers hints about which URLs they should or should not fetch. It does not guarantee removal from search results—noindex and canonical signals handle indexing—but a bad robots.txt can block important sections, hide sitemap URLs, or slow audits when teams cannot read the live file quickly.
How does this robots.txt checker fetch the file?
You enter a site URL or hostname. We normalize it to the site origin and request /robots.txt over the public web, following a limited number of HTTP redirects with the same safety checks as our other website tools. The response status, final URL, and body are shown together with a parsed summary of User-agent groups, Allow/Disallow lines, and Sitemap declarations.
Does robots.txt block indexing?
Disallow in robots.txt asks crawlers not to fetch URLs; it is not the same as a noindex directive. Pages can still appear in results if linked elsewhere without being crawled. For de-indexing, use meta robots or X-Robots-Tag (and remove conflicting signals) in addition to crawl policy.
What is the difference between Allow and Disallow?
Both use path-prefix style rules (RFC 9309). Disallow marks paths that should not be crawled; Allow can narrow exceptions, especially for Googlebot. Longer matching rules win when both apply. This tool lists the lines as published so you can compare blocks for * versus specific bots such as Googlebot or Bingbot.
Why might this tool show something different from my browser?
CDNs, geo routing, A/B splits, and bot management can serve different responses by IP or headers. We use a fixed server-side fetch. If you need to compare headers or final URLs, use our HTTP header checker and redirect chain checker on the same hostname.
Can I use a direct link to robots.txt instead of the homepage?
Yes. If you paste a full URL to /robots.txt, we fetch that address. If you paste any other path on the site, we still resolve the origin and request /robots.txt at the root, which is where crawlers expect the file.
What are common robots.txt mistakes?
Accidental Disallow: / for all agents, outdated disallow rules after migrations, conflicting Allow/Disallow order, missing or relative Sitemap URLs, and deploying staging rules to production. After changes, re-fetch with this checker and validate important URLs with the response code checker.
Is Crawl-delay supported by Google?
Google generally ignores Crawl-delay in robots.txt for Googlebot. Other crawlers may still honor it. We surface Crawl-delay lines when present so you can document legacy or non-Google behavior.
Why do I see a 403 or empty body when the site works for me?
Some hosts block non-browser or data-center IPs. Timeouts, TLS issues, and WAF rules can also interfere. Retry later, verify DNS with our DNS lookup tool, and check TLS with the SSL certificate checker if failures persist.