Here’s what the underlying technology actually does — and what that means for your content.
When AI search responds to a query, it doesn’t retrieve your page and read it top to bottom.
It retrieves fragments of it. Short, semantically relevant passages — typically a few hundred words — and generates its answer from those alone.
Understanding why changes how you think about content. Here’s how it works.
What’s actually happening under the hood
AI search systems like Google’s AI Overviews almost certainly use a technique called Retrieval-Augmented Generation. RAG, for short.
(Note: “almost certainly” is doing some work there. MIT Technology Review noted in 2024 that Google’s spokesperson wouldn’t officially confirm whether AI Overviews uses RAG. The architecture is strongly implied by how the API behaves, but Google hasn’t put it in writing.)
RAG isn’t a new idea. Patrick Lewis and his team — then at Facebook AI Research, University College London, and New York University — published the foundational paper in 2020. It was accepted at NeurIPS, one of the most competitive machine learning venues. This is well-established computer science, not SEO theory.
The basic mechanics are simple:
Your content gets broken into small pieces — chunks. When someone asks a question, the system finds the chunks most semantically similar to that question and passes them to the language model. The model generates its answer from those chunks alone.
Not your page. Not your site. The relevant fragments.
The Google Gemini API makes this visible. When grounding is enabled, the API returns a groundingMetadata object containing groundingChunks — discrete text extracts from source URLs. You can see exactly what the model was given to work with. It isn’t much.
How short is short?
Peer-reviewed research on RAG systems consistently shows that typical chunk sizes fall between 64 and 1,024 tokens depending on query type — roughly 50 to 750 words. A clinical AI study published in PubMed Central in 2025, using Google’s own Gemini model as the backbone, found that adaptive chunking systems target around 500 words per chunk as their optimal unit.
In practice, the per-source figure is likely somewhere in the low hundreds of words. The total context passed to the model across all sources combined is probably in the range of 1,500 to 2,000 words — distributed across three to six sources depending on the query.
Google hasn’t published exact numbers for consumer Search. But the API behaviour and the academic literature are consistent: short passages, not full pages.
Why this is obvious once you understand the architecture
When you understand RAG, the short-fragment behaviour stops being surprising and starts being inevitable.
The whole point of the architecture is to retrieve the most relevant passage, not the most comprehensive page. Shorter, denser, more directly relevant chunks score better in the semantic similarity calculation that determines what gets retrieved.
There’s a concept in a 2025 Springer peer-reviewed paper — Klesel and Wittmann in Business & Information Systems Engineering — they call the “Blinkered Chunk Effect.” Extract a paragraph from a large document and it loses the context that makes it meaningful. That’s a genuine, acknowledged limitation of RAG systems. It applies whether the document is a web page, a medical record, or a legal contract.
The chunk selection is driven by semantic similarity between a query and a passage. That’s a mathematical operation. It isn’t waiting to reward your H2 tags.
The practical implication is simpler than it first appears.
Write clearly. Answer questions directly. Make every section self-contained. Don’t bury the point in a preamble.
That’s not a new content strategy. It’s just good writing — which happens to also be what RAG retrieves well.
What you can actually do
Three things, none of which require specialist help:
Make every section stand on its own. If a paragraph extracted in isolation would lose its meaning — because it references “the above method” or “as mentioned earlier” — fix that. Name your subjects explicitly. The chunk doesn’t carry context with it.
Front-load your answers. The retriever scores semantic similarity between a query and a chunk. A chunk that leads with a direct answer scores better than one that builds to it. This is also just better writing.
Stop thinking about page length as a proxy for quality. Research from Chroma and arXiv confirms that longer pages don’t get more representation. A 5,000-word page and a 600-word page are equally likely to contribute a single relevant chunk. What matters is whether the relevant passage exists, and how well it matches the query.
What you probably can’t change
The grounding budget — the total context window allocated to a search query across all sources — is constrained at the system level. Google sets it. You don’t.
Whether your content is retrieved at all depends on where you rank in the first place. The same ranking signals that determine organic visibility also influence which sources get included in AI grounding. There’s no evidence of a separate, optimisable AI visibility layer that operates independently of how well your content performs in traditional search.
If you’re not ranking, you’re not being chunked.
The honest summary
AI search reads fragments of your page. Typically a few hundred words per source. The total context across all sources is probably in the range of 1,500 to 2,000 words — though Google hasn’t confirmed exact figures for consumer Search, so treat that as directional.
This is how the architecture works. It’s not a bug, a penalty, or a crisis.
The practical implication is to write clearly enough that any extracted paragraph makes sense without the rest of the page around it. That’s a reasonable goal.
It also just happens to be what good writing looks like anyway.
Sources: Lewis et al. (2020), arXiv:2005.11401 — Google Gemini API documentation — Klesel & Wittmann (2025), Business & Information Systems Engineering — Yu et al. (2025), arXiv:2505.21700 — PMC (2025), PubMed Central — MIT Technology Review (2024)
Leave a comment