HTML Document Over 15MB: Google's Crawl Limit and How to Fix It

No Comments

HTML Document Over 15MB: Google's Crawl Limit and How to Fix It

TL;DR

Google documents that Googlebot fetches only the first 15MB of a single HTML file, then drops the rest before indexing. This flag fires when the raw HTML of one URL crosses 15MB. The fix is to shrink the HTML itself: strip inline base64 images, oversized inline SVG, and bloated inline scripts and styles, then paginate or lazy-load enormous lists. Most sites never come close, so treat this as a rare but high-stakes alert and confirm it is real before acting.

What this means

This issue means the raw HTML document for one URL is larger than 15MB. That is the byte size of the HTML response itself, before Googlebot fetches any of the images, scripts, stylesheets, or fonts the page references. Google states that Googlebot fetches up to the first 15MB of an HTML or text-based file, then stops reading that single resource. Referenced files like images, CSS, and JavaScript are requested separately and do not count toward this document's 15MB.

For context, Google notes the median HTML file is about 30 kilobytes, roughly 500 times smaller than this limit. A page that hits 15MB of pure markup is almost always carrying something it should not: thousands of inline base64 images, a runaway server-rendered list, a giant inline data blob, or a script bundle that belongs in an external file.

Why it matters (the 15MB rule)

Google's documented behavior is simple: Googlebot downloads the bytes of the HTML file and, if that file exceeds 15MB, it keeps the first 15MB and discards the rest of that response. Whatever markup sits beyond the cutoff, including headings, body copy, links, and structured data, is never parsed for that URL. If your primary content, internal links, or schema live in the tail of a bloated document, Google may simply not see them.

One reassuring detail: the limit applies only to the initial HTML response, not to the resources it references. Images and videos are fetched as their own requests, so a media-heavy page is not automatically at risk. The danger is markup bloat, content and data baked directly into the HTML, which also wastes crawl capacity and slows rendering. If crawl efficiency matters for your site, see our guide on crawl budget and when you should actually care.

How it gets flagged

During a crawl, SEO ProCheck measures the byte size of each HTML response it downloads, and flags any single document over 15MB. This mirrors how desktop crawlers like Screaming Frog and Sitebulb report response size: they record the size of every fetched document, so an outlier stands out against the typical 30KB to few-hundred-KB baseline. The flag is about the HTML file alone, not the total page weight once images and scripts load.

How to fix it

The goal is to get the HTML response comfortably under 15MB. Work through these in order of likely impact:

  • Strip inline base64 data. Images and fonts encoded as data: URIs in the markup are the most common cause. Move them to real files and reference them by URL so they load as separate resources.
  • Trim oversized inline SVG. Large unoptimized inline SVG can add megabytes. Externalize it, or run it through an SVG optimizer and reference it with img or use.
  • Move bloated inline scripts and styles out. Large inline <script> and <style> blocks, including inlined JSON data dumps, belong in external files. For how Google renders script-heavy pages, see our JavaScript SEO and rendering guide.
  • Paginate or split enormous pages. A single page rendering tens of thousands of rows, comments, or products rarely serves users or crawlers. Break it into paginated views.
  • Lazy-load below-the-fold content. Load long lists and media on demand instead of dumping the entire dataset into the initial HTML. A lighter document also helps your Largest Contentful Paint.
  • Remove duplicated or generated markup. Templating bugs sometimes repeat sections or leak large hidden blocks. Audit the source for accidental repetition.

After changes, re-crawl the URL and confirm the HTML response size has dropped well below the limit.

Have a page that is genuinely too large, or not sure whether this flag is real? We will diagnose it as part of a full technical audit.

Get an Advanced SEO Audit

FAQ

Is 15MB a lot for an HTML file?

Yes, enormously. Google reports the median HTML file is about 30KB, roughly 500 times smaller than 15MB. Almost no normal page approaches this, which is why hitting it usually points to a specific bloat problem rather than ordinary growth.

Do images and videos count toward the 15MB?

No. The 15MB limit applies only to the initial HTML response. Images, video, CSS, and JavaScript referenced in the page are fetched as separate requests, so a media-rich page is not automatically over the limit. The risk comes from content and data inlined into the HTML itself.

What happens to content past 15MB?

Googlebot keeps the first 15MB of the HTML and discards the rest of that response. Any markup, links, or structured data beyond the cutoff is not parsed for that URL, so it can go unseen and unindexed.

Could this be a false positive?

It can be. If the crawler measured a one-off export, debug view, feed, or download endpoint rather than a real indexable page, the flag may not matter. Confirm the URL is a page you actually want indexed before spending time on it. Because real cases are rare, always verify the offending URL first.

Does compression fix it?

Compression like gzip reduces transfer size, but the durable fix is reducing the actual HTML, not just compressing it. Use the steps above to cut real bytes. A page sitting near the ceiling is a warning sign worth addressing early.

Claude Vincent is a technical SEO consultant focused on crawlability, rendering, and AI-search visibility. He writes the field guides and case studies at SEO ProCheck, with a bias toward the durable, unglamorous work that decides whether search engines and AI answer engines can actually read and cite a site.

About SEO ProCheck

Technical SEO consulting and GEO strategy with 20 years of enterprise experience. Case studies, resources, and tools for search and AI visibility.

Work With Me

Technical SEO audits, GEO strategy, site migrations, and international SEO. Hourly consulting for teams who need hands-on support, not just reports.

Subscribe to our newsletter!

More from our blog