How to use this robots.txt analyzer
Paste a canonical homepage URL (with https://), a bare domain such as example.com, or a direct link to /robots.txt. Submit to fetch the file from our servers with redirect following and SSRF protections aligned with the rest of this toolkit. Start with the HTTP status and final URL: a 404 often means “no file,” which behaves differently from a 200 with restrictive rules. Then read parsed groups from top to bottom—each User-agent section applies to the bots named until the next User-agent line.
Cross-check indexing signals beyond the file: use the meta tags extractor for robots meta and viewport, the canonical tag checker for duplicate URL consolidation, and the HTTP header checker for X-Robots-Tag or cache headers that interact with crawling.
Guide: reading Allow, Disallow, and Sitemap lines
Modern crawlers follow RFC 9309-style matching: rules use URL path prefixes, Allow can carve exceptions out of a broad Disallow, and the most specific rule wins when both match. A lone User-agent: * block is the default policy for unspecified bots; vendor-specific blocks (for example Googlebot or Bingbot) override details only for those user agents. Sitemap lines advertise XML sitemap endpoints—prefer absolute https:// URLs so parsers do not have to guess the host.
Remember: robots.txt manages fetch permission, not guaranteed de-indexing. If a URL is disallowed but heavily linked, search engines may still show a snippet without fetching. Pair disallow rules with on-page or header-level noindex when you need stronger removal behavior, and keep marketing parameters under control with the redirect chain checker so crawlers reach stable URLs efficiently.
When to re-check robots.txt in your workflow
- After site migrations, CMS changes, or enabling a CDN/WAF that might serve a different edge response.
- Before and after staging merges—never copy staging disallow rules into production by accident.
- When Search Console reports crawl anomalies on sections you expect indexed.
- When you add or split sitemap files; list them in robots.txt and confirm they resolve with the response code checker.
Infrastructure checks that complement robots.txt
Crawlers need resolvable DNS, valid TLS, and consistent hostnames. Use the DNS lookup tool for A/AAAA/CNAME correctness, the SSL certificate checker before certificate expiry interrupts fetches, and the broken link checker after you change URL patterns referenced in Allow or Disallow rules.
Related free tools
Explore the full website and URL tools on the home page, or jump to a focused utility below.
- Broken Link Checker — Scan outbound links from any URL for 404s and broken hrefs—paste a page and audit links in seconds.
- HTTP Header Checker — Inspect HTTP response headers for any URL: cache control, content-type, CORS, and security-related values.
- Redirect Chain Checker — Trace the full redirect path to the final URL and spot unnecessary hops hurting SEO and performance.
- SSL Certificate Checker — Verify TLS certificate validity, expiry, issuer, and chain for any domain before users hit errors.
- DNS Lookup Tool — Query A, AAAA, MX, CNAME, TXT, NS, and SOA records for troubleshooting email, hosting, and DNS.
- WHOIS Lookup — Look up domain registration details: registrar, dates, and status for research and due diligence.
- IP Address Lookup — Resolve IPv4 or IPv6 to geolocation, ISP, ASN, and hostname for network and fraud analysis.
- Domain Age Checker — See how long a domain has been registered—useful for SEO trust signals and quick vetting.
- Meta Tags Extractor — Extract title, meta description, Open Graph, Twitter Card, and canonical tags from any live URL.
- Open Graph Preview — Preview how a link may appear when shared on social networks before you publish or pitch.