Speakable Schema and the Voice-Answer Opportunity

March 24, 2025
Technical SEO

No Comments

Speakable schema and the voice-answer opportunity

Speakable structured data is one of the most misunderstood items in the schema.org vocabulary. It's still marked as beta, it's currently limited to news content, and yet it remains a genuine lever for surfacing your content through Google Assistant audio responses. Used correctly, it tells a voice assistant exactly which sentences on a page are worth reading aloud. Used carelessly, it does nothing, or worse, marks up text that sounds incoherent when spoken.

What Speakable Schema Actually Does

The speakable property is a part of the WebPage and Article types in schema.org. It identifies the specific sections of a page best suited for audio playback through text-to-speech (TTS). When a user asks Google Assistant a question and your page is a strong match, the assistant can read the speakable portions aloud and attribute the answer to your site, with a link sent to the user's mobile device.

Two things matter here and are routinely ignored:

It is not a general-purpose voice SEO tag. Google's documentation restricts Speakable to news publishers, and eligibility is gated to sites that produce news content. If you run an ecommerce store or a SaaS blog, you are technically outside the supported use case, even though the markup will validate.
It selects existing content; it does not create answers. You point Speakable at sentences already on the page. If those sentences don't stand alone as a spoken answer, the markup is worthless.

The Two Ways to Specify Speakable Content

Speakable accepts a SpeakableSpecification object, and you target content using either CSS selectors or XPath. CSS selectors are almost always the better choice because they survive template changes more gracefully and are easier to audit.

CSS Selector Approach

You reference DOM elements by their id or class. This is the recommended method.

{
 "@context": "https://schema.org/",
 "@type": "WebPage",
 "name": "City Council Approves 2026 Transit Budget",
 "speakable": {
 "@type": "SpeakableSpecification",
 "cssSelector": ["#headline", "#summary"]
 },
 "url": "https://example.com/transit-budget-2026"
}

Here, #headline and #summary must correspond to elements that actually exist on the rendered page. The TTS engine reads the text content of those elements in the order listed.

XPath Approach

Use XPath only when you cannot add stable IDs or classes to the markup, for instance in a locked-down CMS template.

"speakable": {
 "@type": "SpeakableSpecification",
 "xpath": [
 "/html/head/title",
 "/html/body/div/h1/p[1]"
 ]
}

XPath is brittle. A single structural change to the DOM breaks the path silently, and nobody notices until the voice eligibility quietly disappears. Prefer CSS selectors and reserve XPath as a fallback.

Writing Content That Survives Being Read Aloud

This is where most implementations fail. The markup validates, but the spoken result is awful. Google's own guidance recommends keeping speakable sections to roughly 20 to 30 seconds of audio, which is about two to three sentences. Beyond the technical limit, there are editorial rules that make the difference between a usable answer and noise.

Each speakable sentence must stand alone. A listener has no headline, no preceding paragraph, no image caption for context. "It rose 12% after the announcement" is meaningless aloud. "City transit ridership rose 12% after the fare reduction announcement" works.
Lead with the answer, not the setup. The first speakable sentence should resolve the likely question. Inverted-pyramid news writing is naturally well suited to this.
Strip anything that doesn't vocalize. Do not mark up sections containing datelines, bylines, photo credits, "Read more" links, or parenthetical asides. They sound like errors when spoken.
Avoid abbreviations and symbols that TTS mangles. Spell out ambiguous items. "St." reads as "Saint" or "Street" unpredictably; "$5M" may be read character by character.
Keep numbers clean. Round figures and full units read better than precise decimals.

A practical workflow: dedicate a <div id="speakable-summary"> near the top of your article template, write a two-sentence, context-complete summary into it for every article, and point the Speakable selector at that single element. This gives editors one clear field to fill and makes the spoken output predictable.

Which Content Genuinely Benefits

Voice-answer eligibility rewards a narrow band of content. Be honest about whether yours qualifies before investing effort.

Breaking and developing news. The original and supported use case. Time-sensitive factual updates are exactly what people ask assistants about.
Concise factual answers. "Who won," "what was the score," "when does it open," "what was the decision." Content that resolves a single discrete question in a sentence or two.
Recurring, structured updates. Weather summaries, market closes, sports results, event outcomes, anything with a stable template and a clear lead fact.

Content that does not benefit: long-form analysis, opinion pieces, listicles, product pages, comparison guides, and anything where the value is in the full reading rather than a quotable summary. Marking these up doesn't help and signals to an audit that you're applying schema by reflex.

Common Mistakes

Applying Speakable to non-news pages. The markup validates everywhere, which lulls people into thinking it's working. Eligibility is gated; a SaaS blog will see no voice surfacing regardless.
Pointing selectors at content not present in the static HTML. If your summary is injected by client-side JavaScript after load, the selector may not resolve when Google renders. Verify the target exists in the rendered DOM.
Marking up the entire article body. A selector pointing at .article-content tells the assistant to read 1,500 words aloud. It won't, and you've signaled a misunderstanding of the property.
Letting selectors drift. A template redesign renames #summary to #lede and the Speakable spec silently breaks. Add selector existence to your schema regression checks.
Duplicating speakable across the headline and the body. If #headline and the first body sentence say the same thing, the assistant reads it twice. Choose distinct, complementary sections.

Validating Your Implementation

There is no dedicated Speakable report in Google's Rich Results Test, and Search Console does not surface a Speakable enhancement panel. Validate in two layers:

Syntax: Run the JSON-LD through the schema.org validator (validator.schema.org) to confirm the SpeakableSpecification is well-formed and nested correctly inside WebPage or Article.
Target resolution: Open the live page, run your CSS selector in the browser console with document.querySelectorAll('#summary'), and read the returned text aloud yourself. If it sounds like a complete, standalone answer, you're done. If it needs context to make sense, rewrite the source content, not the schema.

The Bottom Line

Speakable schema is a precision instrument, not a checkbox. Its value is concentrated in news and concise-answer content, it depends entirely on whether your marked-up sentences hold up when spoken without context, and it rewards a disciplined template with one clean summary field over scattered selectors across a page. Implement it where it fits, write the summaries to be heard rather than read, and keep your selectors under regression testing. Applied that way, it's a low-cost path to attributed voice answers; applied as a reflex, it's inert markup that quietly does nothing.

Related on SEO ProCheck

Want this handled properly on your site?

It is exactly the kind of work an advanced technical SEO audit covers. See how an advanced SEO audit works →

Claude Vincent

Claude Vincent is a technical SEO consultant focused on crawlability, rendering, and AI-search visibility. He writes the field guides and case studies at SEO ProCheck, with a bias toward the durable, unglamorous work that decides whether search engines and AI answer engines can actually read and cite a site.

About SEO ProCheck

Technical SEO consulting and GEO strategy with 20 years of enterprise experience. Case studies, resources, and tools for search and AI visibility.

Learn more about me

Work With Me

Technical SEO audits, GEO strategy, site migrations, and international SEO. Hourly consulting for teams who need hands-on support, not just reports.

Contact now

Subscribe to our newsletter!

More from our blog

Diagram of the agent-readable file stack showing AGENTS.md in the code repository read by coding agents, llms.txt and llms-full.txt at the website root read by answer engines, and robots.txt plus RSL as the access and licensing layer beneath both.

Prev. Post

Speakable Schema and the Voice-Answer Opportunity

What Speakable Schema Actually Does