Crawl Error Types in Search Console and What Each One Actually Means

March 21, 2021
Technical SEO

No Comments

Crawl error types in search console and what each one actually means

Google Search Console doesn't actually have a "Crawl Errors" report anymore, the old standalone report was retired and folded into the Page Indexing report and the Crawl Stats report in Settings. The states these reports surface get treated as fire alarms when most are routine, and ignored when a handful genuinely cost you indexed pages. This is a decoder for every state you'll see, sorted by whether it's urgent, conditional, or safe to leave alone.

Where crawl errors actually live now

Two reports carry the signal:

Page Indexing (Indexing > Pages): per-URL outcomes after Google tried to crawl and index. This is where "Not indexed" reasons appear.
Crawl Stats (Settings > Crawl stats): host-level fetch behavior, response codes, average response time, and crawl request volume over 90 days. This is where you diagnose why Googlebot is backing off.

A URL can be perfectly indexed and still show up under a non-error "Not indexed" reason elsewhere. Read the two reports together: Page Indexing tells you the per-page verdict, Crawl Stats tells you whether your server is healthy enough for Google to keep crawling at all.

Urgent: states that are actively costing you indexed pages

Investigate these first. Each one means pages you likely want indexed are not.

Server error (5xx), Googlebot got a 500, 502, or 503. If this spikes, Google throttles crawling to avoid hammering a struggling host, so the damage compounds. Check Crawl Stats > By response to confirm it's systemic, then look at server logs, PHP/worker timeouts, or an origin choking under crawl load. A persistent 503 is fine for short maintenance windows but ruinous if it lingers.
Redirect error, a redirect chain that's too long, a loop, an empty target URL, or a bad final hop. Google gives up before reaching a real page. Map the chain with curl -IL and collapse it to a single hop.
Submitted URL has crawl issue, a catch-all for a URL in your sitemap that failed in a way Google couldn't categorize. Always investigate; you explicitly asked Google to index it via the sitemap.
Submitted URL marked 'noindex', a sitemap URL carrying a noindex tag or header. This is a contradiction: you're telling Google "index this" and "don't index this" simultaneously. Decide which is right and fix one side.
Submitted URL blocked by robots.txt, same contradiction at the crawl layer. The page is in your sitemap but disallowed. Either remove it from the sitemap or unblock it.
Submitted URL seems to be a Soft 404, the page returns 200 OK but Google judged the content thin, empty, or error-like ("no results found", an empty cart, a stub). Either add real content and return 200, or return a true 404/410 if the page shouldn't exist.

Conditional: depends entirely on whether the URL should be indexed

These are the most misread states. They are errors only if the affected URL is one you wanted indexed. For canonicalized variants, faceted URLs, and intentionally blocked pages, they're the system working correctly.

Crawled - currently not indexed, Google fetched the page and chose not to index it. No technical fault; it's a quality and demand judgment. Safe to ignore for thin tag pages or near-duplicates. Urgent if it's hitting money pages or a large share of new content, that's a signal of thin or templated content, weak internal linking, or a site that's outrunning its crawl budget. Don't mass-submit these for re-indexing; improve the pages.
Discovered - currently not indexed, Google knows the URL exists but hasn't crawled it yet, often because it's holding back on crawl volume. At small scale, ignore and wait. At scale, this is a crawl-budget and site-health symptom: shrink low-value URL sprawl, speed up the server, and strengthen internal links to the stranded URLs.
Alternate page with proper canonical tag, Google followed your canonical to a different URL. Almost always correct and intended. Only a problem if the chosen canonical is wrong.
Duplicate without user-selected canonical / Duplicate, Google chose different canonical than user, Google found duplicates and either picked a canonical itself or overrode yours. If Google's pick matches your intent, leave it. If it ignored your canonical, your duplicate-clustering signals (canonical tags, internal links, sitemap entries) are inconsistent, align them.
Page with redirect, the URL redirects elsewhere. Expected for retired or consolidated URLs. Not an error.
Excluded by 'noindex' tag / Blocked by robots.txt, Google obeyed your directive. Fine when intentional. Only a problem when you blocked something by accident (a staging rule that leaked to production, an over-broad Disallow).
Blocked due to unauthorized request (401) / Blocked due to access forbidden (403), Googlebot was denied. Intended for genuinely gated content. An error if a CDN, WAF, or bot-protection rule is wrongly blocking Googlebot, verify with the URL Inspection tool's live test and check that you're not serving 403s based on user agent.

Usually safe to ignore

Not found (404), a normal part of the web. Only worth attention if the 404 was a URL with traffic, links, or sitemap inclusion. A clean 404 is a valid, correct response; you don't need to redirect every one.
Soft 404 (the non-submitted variant), same fix as above, but lower priority unless it's a real page you want ranking. Return the correct status code.
Page indexed without content, rarer; Google indexed the URL but couldn't read meaningful content, often a rendering or cloaking issue. Worth a live render check if it appears on real pages.

Reading the Crawl Stats report

Crawl Stats is host-level and answers "is Google able to crawl us efficiently?" Watch three things:

By response, a healthy site is dominated by 200 (and some 304 Not Modified, which is good, it means caching is working). Rising 5xx, 429 (too many requests), or timeout shares mean Google is being actively rejected and will reduce crawl rate.
Average response time, if this climbs, Googlebot crawls fewer URLs per day to be polite. Slow responses are the most common hidden cause of Discovered - currently not indexed at scale.
By purpose (Refresh vs Discovery), heavy Refresh with little Discovery on a site that's publishing new content suggests Google isn't finding new URLs; fix sitemaps and internal linking. A flood of Discovery on junk URLs means parameter or faceted sprawl is eating crawl budget.

Common mistakes

Treating "Not indexed" totals as a problem count. Most of that bucket is intentional, canonicalized, redirected, or noindexed URLs. Filter to reasons that affect pages you actually want indexed.
Mass-requesting indexing for "Crawled - currently not indexed." It doesn't fix the cause and can look manipulative. Improve the page or its internal links instead.
Redirecting every 404. Blanket redirects to the homepage create soft 404s. Let dead pages return 404/410.
Ignoring sample-based reporting. GSC shows example URLs, not the full list. The example you see may be fixed while the underlying issue persists on URLs you can't see, fix the pattern, not just the sample.
Validating a fix before it's actually deployed. Hitting "Validate fix" while the bad response still ships restarts a multi-week clock and delays recovery. Confirm with a live URL Inspection first.

A fast triage order

Open Crawl Stats. If 5xx/timeouts are elevated or response time is climbing, fix the server first, nothing else matters until Google can crawl.
In Page Indexing, sort errors by trend. A sudden spike usually means one deploy or rule change broke many URLs at once, find the common cause.
Separate "should be indexed" URLs from intentional exclusions. Only the former are real crawl errors.
Fix the pattern, deploy, verify with live inspection, then click Validate fix.

The discipline that saves the most time: never assume a Search Console state is bad because it sits under a scary heading. Decide what each affected URL was supposed to do, and the urgent handful separates itself from the noise immediately.

Related on SEO ProCheck

Want this handled properly on your site?

It is exactly the kind of work an advanced technical SEO audit covers. See how an advanced SEO audit works →

Claude Vincent

Claude Vincent is a technical SEO consultant focused on crawlability, rendering, and AI-search visibility. He writes the field guides and case studies at SEO ProCheck, with a bias toward the durable, unglamorous work that decides whether search engines and AI answer engines can actually read and cite a site.

About SEO ProCheck

Technical SEO consulting and GEO strategy with 20 years of enterprise experience. Case studies, resources, and tools for search and AI visibility.

Learn more about me

Work With Me

Technical SEO audits, GEO strategy, site migrations, and international SEO. Hourly consulting for teams who need hands-on support, not just reports.

Contact now

Subscribe to our newsletter!

More from our blog

Diagram of the agent-readable file stack showing AGENTS.md in the code repository read by coding agents, llms.txt and llms-full.txt at the website root read by answer engines, and robots.txt plus RSL as the access and licensing layer beneath both.

Prev. Post

Crawl Error Types in Search Console and What Each One Actually Means

Where crawl errors actually live now

Urgent: states that are actively costing you indexed pages

Conditional: depends entirely on whether the URL should be indexed

Usually safe to ignore

Reading the Crawl Stats report

Common mistakes

A fast triage order

Want this handled properly on your site?

About SEO ProCheck

Work With Me

Subscribe to our newsletter!

More from our blog

AGENTS.md vs llms.txt vs llms-full.txt: Which Agent File Does What

Profound vs Semrush and Ahrefs: What an AI-Search Tool Actually Replaces (and What It Doesn't)

SEO vs AEO vs GEO: What Each One Means and How They Actually Differ

Google May 2026 Core Update: What We Learned After the Dust Settled

Pogosticking: The Click Pattern That Quietly Decides Who Ranks

Interaction to Next Paint (INP): The Complete Guide

SSR vs CSR: Why Rendering Decides Whether AI Can Read Your Site

Which AI Bots Are You Actually Blocking? (GPTBot, ClaudeBot, Perplexity & More)

Recent Posts

Crawl Error Types in Search Console and What Each One Actually Means

Where crawl errors actually live now

Urgent: states that are actively costing you indexed pages

Conditional: depends entirely on whether the URL should be indexed

Usually safe to ignore

Reading the Crawl Stats report

Common mistakes

A fast triage order

Want this handled properly on your site?

About SEO ProCheck

Work With Me

Subscribe to our newsletter!

More from our blog

Recent Posts

All Website Tags