
Both the robots meta tag and the X-Robots-Tag HTTP header tell crawlers the same things, noindex, nofollow, noarchive, and the rest, but they are delivered through completely different channels. The meta tag lives inside an HTML <head>; the header travels in the server response itself. That single distinction decides which tool can reach a given URL, and getting it wrong is one of the most common reasons unwanted files keep showing up in search results.
The core difference: where the directive lives
A robots meta tag is markup. It only exists if the response body is HTML and the crawler parses it:
<meta name="robots" content="noindex, nofollow">, applies to all crawlers<meta name="googlebot" content="noindex">, targets a single user agent
The X-Robots-Tag is part of the HTTP response header, configured at the server level and sent before the body. It works on any resource the server delivers, regardless of file type:
X-Robots-Tag: noindexX-Robots-Tag: googlebot: noindex, nofollow, user-agent-scopedX-Robots-Tag: noindex, nosnippet, multiple directives, comma-separated
Google honors the identical vocabulary in both places. The header is simply the only option when there is no HTML head to write into, and the more scalable option when you need to govern thousands of URLs at once.
A decision framework
Ask three questions, in order. The first one that returns "yes" tells you which mechanism to reach for.
- Is the resource non-HTML? PDFs, images, videos, spreadsheets, JSON feeds, plain-text files, and downloadable binaries have no
<head>. The meta tag is physically impossible to embed. UseX-Robots-Tag. This is the single most important rule, because PDFs and images are exactly the assets that slip into the index unintentionally. - Are you controlling a bulk URL pattern? If the rule applies to an entire directory, a query-parameter signature, a file extension, or a generated section of the site, a server-level header lets you express it once and have it apply everywhere. Editing the head of every matching template, or worse, every static file, does not scale and drifts out of sync.
- Otherwise, use the meta tag. For an individual HTML page where the content team or CMS owns the template, the meta tag is closer to the content, easier to audit in "view source," and doesn't require server access. This covers most thin pages, internal search results rendered as HTML, and paginated or filtered views.
What only the header can do
These are the scenarios where the meta tag is not merely inconvenient but unavailable:
- PDFs and office documents. A white paper, price list, or manual that you want crawlable for users but absent from the index.
- Media files. Images you don't want appearing in image search, or video files served directly.
- Generated and exported assets. CSV exports,
.txtfiles, sitedata feeds, and API responses that are publicly reachable but shouldn't rank. - Staging or sensitive directories where you'd rather emit
noindexacross every file type than rely on per-page markup.
Implementation patterns
On Apache, target a file type across the whole site with a single block in .htaccess or the vhost config:
<FilesMatch ".(pdf|docx|xlsx)$">
Header set X-Robots-Tag "noindex, noarchive"
</FilesMatch>
On Nginx, the equivalent uses a location block:
location ~* .(pdf|docx|xlsx)$ {
add_header X-Robots-Tag "noindex, noarchive";
}
A note on Nginx: add_header does not propagate into a block if that block defines its own add_header directives, and it is skipped on some error responses. Verify the header actually appears on a real request rather than assuming the config is live.
You can also set the header dynamically in application code (PHP's header(), a middleware layer, a CDN edge worker, or your framework's response object), which is ideal when the indexing decision depends on logic, for example, emitting noindex on filtered listing URLs that carry more than one query parameter.
The crawlability prerequisite, true for both
Neither mechanism works if the URL is blocked in robots.txt. This trips people up constantly: a crawler that is disallowed from fetching a URL never sees the response, and therefore never sees the noindex in the header or the head. A disallowed page can still be indexed from external links, shown with no snippet, precisely because the directive telling it to drop out was never read.
The fix is to allow crawling and apply noindex. Let the crawler in, let it read the directive, let it drop the URL. Only after the page has been recrawled and de-indexed should you consider blocking it in robots.txt to save crawl budget.
Common mistakes
- Trying to
noindexa PDF with a meta tag. There's nowhere to put it. The asset stays indexed until you switch to the header. - Combining
Disallowin robots.txt withnoindex. They cancel each other out, the block prevents the directive from ever being seen. - Setting both a meta tag and a header with conflicting values. When a crawler sees two directives for the same signal, the more restrictive one generally wins, so a stray
noindexanywhere can quietly de-index a page you wanted live. Audit for duplicates. - Assuming the header is set without checking. Confirm with
curl -I https://example.com/file.pdfand look for theX-Robots-Tagline in the response. Server config caveats mean "I added it" and "it's being sent" are not the same thing. - Forgetting that directives are case-insensitive but vocabulary is fixed. Only documented values (
noindex,nofollow,noarchive,nosnippet,noimageindex,unavailable_after, etc.) do anything. Invented values are ignored silently.
FAQ
Does the header carry any ranking weight or speed penalty? No. It's a tiny string in a response you're already sending. There is no measurable performance cost to applying it across a directory.
Can I use the header on HTML pages too? Yes. It's perfectly valid, and it's the right call when you want to govern HTML pages by pattern without editing templates. The meta tag is just usually more convenient for one-off HTML pages.
How fast does removal happen? Only on the next crawl of that URL. To accelerate it, request indexing or resubmit the URL in your search console of choice, and make sure the page is internally linked enough to be recrawled promptly.
Do other search engines respect X-Robots-Tag? Google and Bing both support it. Treat compliance from smaller or non-mainstream crawlers as unreliable, and never use either mechanism as a security control, anything you truly need hidden belongs behind authentication, not a polite request to stay out of the index.
Want this handled properly on your site?
It is exactly the kind of work an advanced technical SEO audit covers. See how an advanced SEO audit works →
Claude Vincent is a technical SEO consultant focused on crawlability, rendering, and AI-search visibility. He writes the field guides and case studies at SEO ProCheck, with a bias toward the durable, unglamorous work that decides whether search engines and AI answer engines can actually read and cite a site.
About SEO ProCheck
Technical SEO consulting and GEO strategy with 20 years of enterprise experience. Case studies, resources, and tools for search and AI visibility.
Work With Me
Technical SEO audits, GEO strategy, site migrations, and international SEO. Hourly consulting for teams who need hands-on support, not just reports.








