crawl error
crawl/robots-txt Robots.txt SEO Rule
Verify that your site has a properly configured robots.txt file that allows search engine crawling
What This Rule Checks
Fetches and validates the robots.txt file at the site root. Checks for file existence (404), complete crawler blocking (Disallow: / for User-agent: *), and proper configuration.
Why It Matters for SEO & GEO
The robots.txt file controls which pages search engines can crawl. A missing or misconfigured file can block your entire site from being indexed or waste crawl budget.
How to Fix
Create a robots.txt file at your domain root. Allow crawling of important pages. Use specific Disallow rules instead of blocking everything. Include a Sitemap directive.
Examples
Bad
User-agent: *\nDisallow: / Good
User-agent: *\nDisallow: /admin/\nDisallow: /private/\nSitemap: https://example.com/sitemap.xml How VibeLinter Checks Robots.txt
VibeLinter’s crawl/robots-txt rule performs these checks:
- File existence — Fetches
{domain}/robots.txtand errors if it returns a 404 status - Complete blocking detection — Parses the file to detect
User-agent: *combined withDisallow: /which blocks all crawlers - Line-by-line analysis — Reads each directive line to identify the specific user-agent associated with complete disallow rules
- Network error handling — Reports domain accessibility issues (DNS failures, connection refused)
- Success confirmation — Reports an info message when robots.txt is found and properly configured
Configuration
// vibelinter.config.cjs
module.exports = {
rules: {
'crawl/robots-txt': {
enabled: true,
severity: 'error'
}
}
}
SEO Impact
Proper robots.txt configuration affects:
- Crawl access — Determines which pages search engines can discover and index
- Crawl budget — Blocking unimportant pages preserves budget for valuable content
- Indexation control — Works alongside meta robots tags and canonical URLs for complete crawl management
- GEO (Generative Engine Optimization) — AI crawlers (GPTBot, Google-Extended, etc.) check robots.txt before accessing content; proper configuration ensures your content is available to AI systems while blocking unwanted bots
Related SEO Topics
robots.txt SEOcrawl directives filesearch engine crawlingrobots.txt checkerdisallow directivecrawl budget optimization