Using the GSC URL Inspection API to Monitor Indexing at Scale

December 5, 2025
Analytics & Measurement

No Comments

Using the gsc url inspection api to monitor indexing at scale

Google's URL Inspection API is the only programmatic way to read the same index, canonical, and coverage data that powers the URL Inspection tool inside Search Console. For sites with thousands of pages, clicking through that tool one URL at a time is useless. This guide shows you how to query the API at scale, what each field actually means, and how to turn raw responses into an indexing monitoring system you can run on a schedule.

What the API returns and why it beats the Coverage report

The Coverage (Page Indexing) report in Search Console aggregates URLs into buckets, but it samples, lags, and rarely exposes the specific URL list behind a status. The URL Inspection API gives you per-URL truth pulled from Google's index, including fields the aggregate report hides.

A single call to urlInspection.index.inspect returns an indexStatusResult object. The fields that matter most for monitoring:

verdict, PASS, FAIL, NEUTRAL, or PARTIAL. This is the headline status.
coverageState, the human-readable reason, e.g. "Submitted and indexed", "Crawled - currently not indexed", "Discovered - currently not indexed", "Duplicate without user-selected canonical".
googleCanonical vs userCanonical, Google's chosen canonical vs the one you declared. A mismatch here is the single most actionable signal the API offers.
indexingState, INDEXING_ALLOWED or a blocking reason like BLOCKED_BY_META_TAG or BLOCKED_BY_ROBOTS_TXT.
robotsTxtState, pageFetchState, lastCrawlTime, and crawledAs (desktop vs mobile smartphone).
referringUrls and sitemap, where Google found the URL.

You also get mobileUsabilityResult, richResultsResult, and ampResult in the same payload, so one call covers indexing and structured-data health together.

Setup: auth, scope, and the one hard limit

The API uses the Search Console API surface, so authenticate with a Google Cloud service account or OAuth client that has the https://www.googleapis.com/auth/webmasters.readonly scope. The authenticated principal must be a verified user on the property you inspect, add the service account email as a full or restricted user in Search Console settings, and use the exact property string (including sc-domain: prefix for domain properties).

The constraint that shapes your entire architecture: the quota is 2,000 queries per day per property, with a short-term ceiling around 600 per minute. You cannot inspect a 50,000-URL site daily. Plan around the quota rather than fighting it, that is the central engineering problem this API presents.

A minimal Python request

Using the google-api-python-client library, a single inspection looks like this:

Build the service: service = build('searchconsole', 'v1', credentials=creds)
Call it:
service.urlInspection().index().inspect(body={'inspectionUrl': url, 'siteUrl': property_url, 'languageCode': 'en-US'}).execute()

The response nests everything under inspectionResult.indexStatusResult. Extract the fields you care about and write them to a row. The pattern that scales is a worker pool of 5, 10 concurrent threads with retry-on-429 backoff, throttled to stay under the per-minute ceiling.

Prioritizing which URLs to inspect

Because you can only afford ~2,000 inspections a day, never inspect your whole site blindly. Build a priority queue:

New and recently changed URLs, pages published or updated in the last few days, pulled from your CMS or sitemap lastmod. Confirm they get indexed.
Revenue and conversion pages, a fixed watchlist inspected every run regardless of other signals.
Pages with traffic anomalies, cross-reference the Search Analytics API. Any URL whose clicks dropped sharply gets inspected to check for a coverage or canonical change.
A rotating sample of the long tail, cycle through the remaining inventory so every URL is checked every N days. At 2,000/day a 30,000-URL site gets full coverage roughly every 15 days, which is fine for a baseline.

Store a last_inspected timestamp per URL and let the scheduler pick the oldest, highest-priority candidates each run.

Turning responses into monitoring signals

Raw verdicts are not alerts. Persist every inspection to a database (one row per URL per run) and derive state transitions. The conditions worth alerting on:

Canonical mismatch: googleCanonical != userCanonical and the user canonical is self-referential. Google is overriding your canonical, often consolidating the page into a near-duplicate. This silently removes pages from the index.
Indexed → not indexed: a URL whose coverageState flips from "Submitted and indexed" to "Crawled - currently not indexed". This is your earliest warning of quality demotion or thin-content suppression.
Stuck in discovery: "Discovered - currently not indexed" persisting across runs signals crawl-budget or quality starvation, common on large or scaled-content sites.
Unexpected noindex/robots block: indexingState becomes BLOCKED_BY_META_TAG or robotsTxtState changes to DISALLOWED, usually a deploy regression. These deserve a same-day alert.
Crawl failures: pageFetchState anything other than SUCCESSFUL.

Compute these as diffs against the previous row for each URL, then push only the transitions to Slack, email, or a dashboard. Logging current state without diffing buries the signal.

Schema for storing results

A flat table is enough. Index it on (site_url, inspection_url, inspected_at) and store at minimum: verdict, coverage_state, google_canonical, user_canonical, indexing_state, robots_txt_state, page_fetch_state, last_crawl_time, crawled_as, and the raw JSON for anything you query later. Keeping the raw response means you never have to re-spend quota to backfill a field you forgot to extract.

Common mistakes

Treating it like an indexing-request API. Inspection is read-only. To request indexing, use the separate Indexing API (and only for JobPosting/BroadcastEvent per Google's terms) or submit sitemaps.
Inspecting the whole site daily. You will exhaust quota by mid-morning and get nothing useful. Prioritize ruthlessly.
Ignoring lastCrawlTime. A "PASS" verdict from a crawl three months ago tells you little about the current page. Weight freshness into your alerting.
Not handling 429s. Burst past the per-minute limit and calls fail silently in naive scripts. Implement exponential backoff and respect Retry-After.
Using the wrong property string. Domain properties require the sc-domain: prefix; URL-prefix properties need the exact protocol and trailing slash. A mismatch returns a permission error, not a helpful message.
Comparing canonicals without normalizing. Trailing slashes, protocol, and parameter order cause false mismatch alerts. Normalize both sides before diffing.

FAQ

How fresh is the data? It reflects Google's last index/crawl of the URL, not a live re-crawl. The tool's "Live Test" feature is not exposed via the API, you only get the indexed snapshot.

Can I raise the 2,000/day quota? No. It is a fixed per-property limit and not adjustable through Cloud Console, which is why multi-property accounts and prioritization matter.

Does it work on domain properties? Yes, as long as you pass the sc-domain: property string and the principal is verified on it.

Built this way, the API becomes a daily indexing health monitor: it catches canonical hijacks, noindex regressions, and quality-driven deindexing days or weeks before they surface in your traffic reports.

Related on SEO ProCheck

Want this handled properly on your site?

It is exactly the kind of work an advanced technical SEO audit covers. See how an advanced SEO audit works →

Claude Vincent

Claude Vincent is a technical SEO consultant focused on crawlability, rendering, and AI-search visibility. He writes the field guides and case studies at SEO ProCheck, with a bias toward the durable, unglamorous work that decides whether search engines and AI answer engines can actually read and cite a site.

About SEO ProCheck

Technical SEO consulting and GEO strategy with 20 years of enterprise experience. Case studies, resources, and tools for search and AI visibility.

Learn more about me

Work With Me

Technical SEO audits, GEO strategy, site migrations, and international SEO. Hourly consulting for teams who need hands-on support, not just reports.

Contact now

Subscribe to our newsletter!

More from our blog

Diagram of the agent-readable file stack showing AGENTS.md in the code repository read by coding agents, llms.txt and llms-full.txt at the website root read by answer engines, and robots.txt plus RSL as the access and licensing layer beneath both.

Prev. Post

Using the GSC URL Inspection API to Monitor Indexing at Scale

What the API returns and why it beats the Coverage report

Setup: auth, scope, and the one hard limit

A minimal Python request

Prioritizing which URLs to inspect

Turning responses into monitoring signals

Schema for storing results

Common mistakes

FAQ

Want this handled properly on your site?

About SEO ProCheck

Work With Me

Subscribe to our newsletter!

More from our blog

AGENTS.md vs llms.txt vs llms-full.txt: Which Agent File Does What

Profound vs Semrush and Ahrefs: What an AI-Search Tool Actually Replaces (and What It Doesn't)

SEO vs AEO vs GEO: What Each One Means and How They Actually Differ

Google May 2026 Core Update: What We Learned After the Dust Settled

Pogosticking: The Click Pattern That Quietly Decides Who Ranks

Interaction to Next Paint (INP): The Complete Guide

SSR vs CSR: Why Rendering Decides Whether AI Can Read Your Site

Which AI Bots Are You Actually Blocking? (GPTBot, ClaudeBot, Perplexity & More)

Recent Posts

Using the GSC URL Inspection API to Monitor Indexing at Scale

What the API returns and why it beats the Coverage report

Setup: auth, scope, and the one hard limit

A minimal Python request

Prioritizing which URLs to inspect

Turning responses into monitoring signals

Schema for storing results

Common mistakes

FAQ

Want this handled properly on your site?

About SEO ProCheck

Work With Me

Subscribe to our newsletter!

More from our blog

Recent Posts

All Website Tags