crawl error crawl/robots-txt

Robots.txt SEO Rule

Verify that your site has a properly configured robots.txt file that allows search engine crawling

What This Rule Checks

Fetches and validates the robots.txt file at the site root. Checks for file existence (404), complete crawler blocking (Disallow: / for User-agent: *), and proper configuration.

Why It Matters for SEO & GEO

The robots.txt file controls which pages search engines can crawl. A missing or misconfigured file can block your entire site from being indexed or waste crawl budget.

How to Fix

Create a robots.txt file at your domain root. Allow crawling of important pages. Use specific Disallow rules instead of blocking everything. Include a Sitemap directive.

Examples

Bad

User-agent: *\nDisallow: /

Good

User-agent: *\nDisallow: /admin/\nDisallow: /private/\nSitemap: https://example.com/sitemap.xml

How VibeLinter Checks Robots.txt

VibeLinter’s crawl/robots-txt rule performs these checks:

  1. File existence — Fetches {domain}/robots.txt and errors if it returns a 404 status
  2. Complete blocking detection — Parses the file to detect User-agent: * combined with Disallow: / which blocks all crawlers
  3. Line-by-line analysis — Reads each directive line to identify the specific user-agent associated with complete disallow rules
  4. Network error handling — Reports domain accessibility issues (DNS failures, connection refused)
  5. Success confirmation — Reports an info message when robots.txt is found and properly configured

Configuration

// vibelinter.config.cjs
module.exports = {
  rules: {
    'crawl/robots-txt': {
      enabled: true,
      severity: 'error'
    }
  }
}

SEO Impact

Proper robots.txt configuration affects:

  • Crawl access — Determines which pages search engines can discover and index
  • Crawl budget — Blocking unimportant pages preserves budget for valuable content
  • Indexation control — Works alongside meta robots tags and canonical URLs for complete crawl management
  • GEO (Generative Engine Optimization) — AI crawlers (GPTBot, Google-Extended, etc.) check robots.txt before accessing content; proper configuration ensures your content is available to AI systems while blocking unwanted bots

Related SEO Topics

robots.txt SEOcrawl directives filesearch engine crawlingrobots.txt checkerdisallow directivecrawl budget optimization

Related Rules

References