ContentGrapher
ContentGrapher
research/aio-citation-study/methodology
AIO citation studyMethodology

Study design and methodology

Research question

Do pages cited in Google AI Overviews score higher on ContentGrapher structural completeness metrics than pages that rank organically for the same queries but are not cited?

The study is observational: we collected existing pages as AIO cited them on a single snapshot date and measured structural quality after the fact. No pages were created or modified for this study.

Conditions

Condition A (cited): Pages appearing in the AIO references list for at least one retained query. When a URL appeared as both a citation for one query and an uncited organic result for another, it was classified as Condition A globally (cited status wins).

Condition B (uncited): Pages in the top-10 organic results for a qualifying query that were not cited in any AIO across the full corpus.

Query corpus

25 candidate queries were pre-registered before the SERP snapshot: five per industry category (technology, marketing, business, education, finance), all in the explain query format (“what is X”). Queries were selected to represent generic informational concepts where AIO is commonly triggered and where the corpus would be free of dominant brand confounds.

A query was retained if its SERP showed an AI Overview with at least 2 cited sources and at least 1 uncited top-10 organic result after exclusions. 22 of 25 queries were retained (3 excluded: 1 no AIO present, 2 with 0 citations).

SERP data collection

All SERP data was collected via the DataForSEO Live Advanced endpoint ( serp/google/organic/live/advanced) on 2026-06-16 between 02:02 and 02:04 UTC. Parameters: location_code 2840 (United States), language English, device desktop, depth 10. AIO data was extracted from items of type ai_overview; citations from the references array (type ai_overview_reference). Organic results were collected to depth 10 (see Relaxations below).

URL exclusions

The following URL types were excluded before Phase 1 analysis: Wikipedia, Reddit, Quora, Stack Overflow, LinkedIn, YouTube, Google properties, PDF files, and URLs returning non-200 HTTP status. These represent user-generated content aggregators, paywalled networks, or formats incompatible with Phase 1 extraction. 60 URLs were excluded pre-analysis.

Dominant brand pages were identified via a pre-registered shortlist (major cloud platforms, enterprise software companies, large media publishers) and dynamically for any domain appearing as a citation in 3 or more retained queries. 17 dominant-brand pages were identified and flagged for the non-dominant sensitivity analysis; they remain in the primary analysis.

Phase 1 analysis

Each URL was analyzed using ContentGrapher Phase 1: plain-text fetch followed by LLM extraction of the concept map, question coverage, and content context. No audience specification was provided (all queries are generic explain queries). Phase 1 uses no credits. Concurrency was capped at 5 simultaneous analyses.

Per-item results were cached immediately. 217 of 232 analyzable URLs returned a valid Phase 1 result; 15 failed fetch or extraction and were excluded. The exclusion rate was equal across conditions and is documented in the data page.

Metrics

Four primary metrics, each derived from Phase 1 output:

  1. Coverage score (0–1): Computed by computeCoverageScore() from the observed concept map and augmented question coverage. Combines concept presence, integration depth, and dimensional coverage into a single 0–1 score.
  2. Core concept count: Number of concepts classified as core (the primary retrieval role of the page) in the Phase 1 concept map.
  3. % well-integrated (core): Fraction of core concepts with integration level well_integrated rather than merely mentioned or lightly covered.
  4. Question coverage rate: Fraction of the 8 diagnostic question dimensions (whatIsIt, howDoesItWork, etc.) that have at least one answered question in the augmented question coverage output.

Statistical analysis

All CIs are query-level bootstrap (5,000 resamples). The unit of analysis is the per-query mean gap (mean_A minus mean_B for that query). Bootstrap resampling is over the 17 qualifying queries (those with n ≥ 3 URLs in both conditions). This controls for the fact that different queries have different numbers of cited and uncited pages.

A non-dominant sensitivity analysis was run by dropping Condition A pages from dominant domains for each query before computing the per-query gap. The purpose is to detect whether aggregate results are driven by large authority domains (IBM, HubSpot, Adobe, etc.) that are cited regardless of structural quality.

Pre-registered gates

Five findings were pre-registered before any URL was analyzed. All gates were specified in the PRD before SERP data collection began. Results below:

Corpus qualification
≥ 18 of 25 candidate queries retained (AIO present + ≥ 2 citations + ≥ 1 uncited organic result)
PASS
22 of 25 queries qualified. 3 excluded: 1 with no AIO, 2 with 0 citations.
Query-level analysis floor
≥ 3 URLs per condition (A and B) per query for query-level bootstrap
PASS (17/22)
17 of 22 queries met n ≥ 3 in both conditions. 5 queries retained in corpus but excluded from bootstrap CI computation.
Coverage score — Finding A
Non-dominant gap ≥ 0.06 AND 95% CI excludes zero
FAIL
Non-dominant gap = 0.007, CI[−0.032, 0.045]. Far below threshold. Finding narrowed.
Coverage score — Finding B
Full gap ≥ 0.06 AND non-dominant gap < 0.03 (authority artifact)
FAIL
Full gap = 0.011, below 0.06 threshold.
Metric divergence — Finding C
Coverage gap < 0.04 AND well-integrated gap ≥ 10pp with CI excluding zero
FAIL
Coverage gap = 0.011 (< 0.04, pass) but well-integrated gap = 4.9pp (< 10pp, fail).
Clean null — Finding D
All primary metrics gap < 0.03 in non-dominant, CIs overlap zero
FAIL (partial)
Coverage score (0.007) and QCR (−0.002) meet threshold. Well-integrated (5.6pp) does not. Closest to D of all findings.

No pre-registered gate passed. The published finding is narrowed from Finding A (structural correlation) to a null result: no significant difference in structural completeness between cited and uncited pages. The only directional trend (% well-integrated, 4.9pp) is noted on the main page as a borderline observation, not a finding.

Relaxations

Organic candidate pool widened to top-10

Original spec: Organic candidate pool: top-5 organic results per query
Relaxed to: Organic candidate pool: top-10 organic results per query
Reason: After s1 SERP snapshot with top-5 organic, only 6 of 20 qualifying queries met the n ≥ 3 gate for Condition B. Google cites most top-5 organic pages for explain queries, leaving too few uncited candidates. Widening to top-10 added positions 6–10, which rank for the query but are rarely cited. This is documented here as a design relaxation; it is not a post-hoc adjustment.

What this study cannot claim

  1. Causal claims about what predicts AIO citation. We eliminate structural completeness as a primary signal; we do not identify the actual selection mechanism.
  2. Generalization beyond explain queries. AIO citation for guide, compare, or evaluate queries may follow different patterns.
  3. Generalization beyond a single SERP snapshot. AIO citation sets change as Google refreshes. A different date would produce a different corpus.
  4. Claims about uncited pages in general. Condition B contains only top-10 organic pages. They already pass Google's relevance filter. This study does not compare cited pages to low-quality or low-ranking pages.
← Back to findingsThe data →All research