AIO citations do not favor structurally complete pages
We pulled the pages Google cited in AI Overviews for 22 explain queries and measured ContentGrapher structural completeness on 217 pages: 135 AIO-cited and 82 uncited pages ranking organically for the same queries. Then we checked whether the cited pages scored higher on four structural metrics.
They did not. The coverage score gap between cited and uncited pages was 0.011, and the confidence interval spans zero in both directions. On three of the four metrics we measured, the two groups are statistically indistinguishable.
That is the finding. The rest of this page explains what we measured, why the null is mechanistically expected, and the one directional signal worth noting.
Query-level bootstrap CI (n=5,000 resamples, 17 qualifying queries). Coverage score and question coverage rate are 0–1 scale; % well-integrated and core concept count shown at natural scale.
Each row is scaled independently. Filled dot = borderline signal (CI barely includes zero). Open dot = not significant. Query-level bootstrap, 5,000 resamples.
What we measured and why
The question behind this study was practical: if Google's AI Overview cites a page, does that page cover its topic more completely than a page that ranks organically but gets left out? If the answer were yes, improving structural completeness would be a measurable path to AIO citation. If the answer is no, the mechanism is elsewhere.
We collected a snapshot of 22 “what is” queries across technology, marketing, business, education, and finance where Google showed an AI Overview with at least two cited sources. For each query we identified the AIO-cited pages (Condition A) and the organic pages that ranked in the top 10 but were not cited anywhere in our corpus (Condition B). We then ran ContentGrapher Phase 1 analysis on every URL to measure four things: coverage score, core concept count, the proportion of core concepts that are well-integrated rather than merely mentioned, and question coverage rate across eight diagnostic dimensions.
All 217 pages were analyzed without a human audience spec, since the queries are generic explain queries with no meaningful audience differentiation. Phase 1 uses no credits.
The distributions overlap almost completely
The summary statistics show why the gap is so small. Both conditions average around 0.50 on coverage score, have the same median, and the same fraction of pages scoring above 0.60. The “AIO-cited pages are better” assumption, if it exists, is not supported by what we measured.
Each dot is one URL. Vertical scatter is jitter to show density; horizontal position is coverage score. Means are nearly identical.
Coverage score distribution (0–1 scale). URL-level raw values before query-level normalization.
The direction flips across queries
The aggregate gap of 0.011 covers an enormous amount of per-query variance. In 11 of the 17 qualifying queries, the cited pages scored higher on coverage score. In 6, the uncited pages scored higher. The gaps range from +0.149 (brand positioning, cited pages better) to −0.166 (service mesh, uncited pages better). When a number flips sign in more than a third of observations, the aggregate is telling you the average is near zero, not that there is a consistent pattern.
22 qualifying queries (all had AIO + ≥2 citations + ≥1 uncited organic result); 5 excluded from query-level analysis (n < 3 in one condition).
What passage-level retrieval implies
Google announced passage ranking in October 2020: the ability to identify individual sections of a page and determine relevance from that section alone, independent of what the rest of the page covers. The official description as of December 2025 is “an AI system we use to identify individual sections or ‘passages’ of a web page” (Google Search Central, 2025). What counts as a passage, in Google's implementation, is not public.
The architecture behind Google's passage ranking and the canonical RAG designs operates on sub-document chunks, not full pages. Karpukhin et al. (2020) defined a passage as a non-overlapping 100-word segment, splitting English Wikipedia into 21 million such units for Dense Passage Retrieval. Lewis et al. (2020) used the same corpus for the original RAG architecture. In both designs the page boundary is irrelevant: whether a page covers 5 topics or 15 is invisible to a retriever that evaluates one 100-to-200-word window at a time. What determines retrieval is whether that window, read in isolation, answers the sub-query it is matched against.
Canonical RAG and DPR designs use 100-word non-overlapping passage windows (Karpukhin et al., 2020; Lewis et al., 2020).
This gives the null finding on coverage breadth a concrete explanation. Coverage score measures how many concepts the page covers in aggregate. A passage-level retriever cannot see the aggregate. It sees one chunk.
The one signal worth noting
The closest thing to a real signal is in the metric that measures how well pages integrate their core concepts: 49.1% of core concepts in cited pages are well-integrated, versus 44.2% in uncited pages. That is a 4.9 percentage point gap, and the direction held in 12 of 17 queries.
The 95% confidence interval on that gap runs from −0.9pp to +10.5pp, which means it just barely includes zero. We do not call this a finding. We note it because it is the metric that most consistently trended the same direction, and because the distinction between a page that “covers” a concept and one that “integrates” it into a coherent explanation maps onto the passage-level quality criterion the retrieval literature identifies as relevant. Nainwani and Baban (2025) distinguish between a passage that is semantically matched to a query and one that is contextually complete enough to reason from. A concept that is merely mentioned creates a section that references it without explaining it. A concept that is well-integrated creates a section that stands on its own. In passage-retrieval terms, the second is the retrievable one. The integration depth gap is directional, not significant, and this study cannot establish causation. But the direction is consistent with what passage-level retrieval would predict.
What this does not mean
This study does not say structural completeness is irrelevant to AI retrieval. Our findability study showed that pages built around the right structural recommendations are found 84% of the time versus 4% without them. That result holds. This study asks a different question: whether the pages Google happens to cite in AI Overviews score higher on ContentGrapher metrics than the pages it does not cite. The answer is no.
The difference matters because AI Overviews citation is not the same as AI retrieval. AIO citation reflects Google's ranking of sources for a surface answer: it is influenced by authority, freshness, exact-phrase match, domain trust, and factors that have nothing to do with whether a page explains its topic thoroughly. The decoy study showed that structural quality matters for the retrieval systems that power AI chat and search tools. Whether it also predicts AIO citation from Google specifically is what this study tested, and the answer is that it does not.
What we cannot claim
- 01We measured one SERP snapshot for each query. AIO citation sets change as Google refreshes the overview, so a different snapshot window would likely produce a partially different corpus. The result here applies to one moment.
- 02The study covers 22 queries, all in the explain category. AIO citation selection may work differently for guide, compare, or evaluate queries. We cannot generalize beyond explain queries.
- 03We measure structural completeness. We do not measure authority, domain trust, freshness, or exact-phrase match. Some or all of those factors may be what AIO selection actually optimizes for. This study eliminates structural completeness as the primary signal; it does not identify the actual signal.
- 04Condition B pages were ranked in the top 10 organically, so they already meet a high relevance bar. They are not random pages. A study comparing cited pages to random web pages would likely find a different result.
- 05We measured structural quality at the page level. If AIO source selection operates at the passage level, the coverage score cannot see whether any individual section of a page answers a sub-query in isolation. A page with narrow breadth but one deeply developed section may be more retrievable than a page with broad coverage and no self-contained passage. Testing that hypothesis requires scoring sections rather than pages.
The answer
Pages cited in Google AI Overviews are not measurably more structurally complete than the organic pages Google did not cite for the same queries. Coverage score gap: 0.011. Core concept count gap: −0.108. Question coverage rate gap: −0.002. None of these clear a significance threshold. The only directional trend, in concept integration (4.9pp), is borderline and not significant at 95%.
Building structurally complete content is still the right call for AI retrieval, where the findability study showed it matters enormously. But if your goal is specifically to appear in Google's AI Overview, our data says page-level structural breadth is not where the selection is happening. The retrieval literature suggests the answer is in the passage, not the page. What that analysis would find is the next question.
References
Google. (2025, December 10). A guide to Google Search ranking systems. Google Search Central. https://developers.google.com/search/docs/appearance/ranking-systems-guide
Karpukhin, V., Oguz, B., Min, S., Lewis, P., Wu, L., Edunov, S., Chen, D., & Yih, W. (2020). Dense passage retrieval for open-domain question answering. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (pp. 6769–6781). Association for Computational Linguistics. https://arxiv.org/abs/2004.04906
Lee, J., Wettig, A., & Chen, D. (2021). Phrase retrieval learns passage retrieval, too. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (pp. 3661–3672). Association for Computational Linguistics. https://arxiv.org/abs/2109.08133
Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., Küttler, H., Lewis, M., Yih, W., Rocktäschel, T., Riedel, S., & Kiela, D. (2020). Retrieval-augmented generation for knowledge-intensive NLP tasks. Advances in Neural Information Processing Systems, 33, 9459–9474. https://arxiv.org/abs/2005.11401
Liu, N. F., Lin, K., Hewitt, J., Paranjape, A., Bevilacqua, M., Petroni, F., & Liang, P. (2023). Lost in the middle: How language models use long contexts. Transactions of the Association for Computational Linguistics, 12, 157–173. https://arxiv.org/abs/2307.03172
Manning, C. D., Raghavan, P., & Schütze, H. (2008). Introduction to information retrieval. Cambridge University Press.
Nainwani, H., & Baban, H. (2025). Search is not retrieval: Decoupling semantic matching from contextual assembly in RAG. arXiv:2511.04939. https://arxiv.org/abs/2511.04939