Robots.txt is a plain text file at the root of a site that tells crawlers which URLs they may or may not request.
The file lives at one fixed location, the domain root, for example https://example.com/robots.txt. It follows the Robots Exclusion Protocol: a set of rules grouped under one or more user-agent lines, each followed by Allow and Disallow directives. A crawler reads the group that matches its own name, falls back to the wildcard group if there is no exact match, and obeys those rules before requesting pages.
User-agent: *
Disallow: /cart/
Disallow: /search?
Sitemap: https://example.com/sitemap.xmlThe key thing to understand is that robots.txt controls crawling, not indexing. Blocking a URL stops compliant bots from fetching it, which helps preserve crawl budget on large sites and keeps bots out of low-value areas like faceted search or internal scripts. But a blocked URL can still appear in search results, usually with no description, if other pages link to it, because Google never crawled the page to read a noindex tag. To keep a page out of the index, allow crawling and use a noindex robots meta tag or X-Robots-Tag header instead. Major search engines honor robots.txt voluntarily; it is not a security control, and the file is publicly readable.
Related: Robots.txt complete reference, Crawl budget explained, Sitemap Index
Claude Vincent is a technical SEO consultant focused on crawlability, rendering, and AI-search visibility. He writes the field guides and case studies at SEO ProCheck, with a bias toward the durable, unglamorous work that decides whether search engines and AI answer engines can actually read and cite a site.
About SEO ProCheck
Technical SEO consulting and GEO strategy with 20 years of enterprise experience. Case studies, resources, and tools for search and AI visibility.
Work With Me
Technical SEO audits, GEO strategy, site migrations, and international SEO. Hourly consulting for teams who need hands-on support, not just reports.








