ImprovedJune 11, 2026

The Decoy Study Validates the Gap Picks

You can now read a controlled study showing that ContentGrapher's gap picks lift AI retrieval by 11 percentage points more than adding the same volume of content to parts the page already covers, on pages with five or more flagged gaps.

Problem

Until now, the case that filling flagged gaps actually improves AI retrieval rested on the underlying scoring math and small spot checks. There was no controlled evidence that the specific picks matter, separate from the general effect of adding more structural content to a page.

Context

Two questions kept surfacing from users evaluating ContentGrapher: do the analyzer's picks beat random structural addition, and is the gap count itself meaningful? A study designed around a decoy arm, where the control treatment adds equal content to parts the analyzer said were already fine, is the only way to isolate the signal.

Why now

Coverage Score v3 stabilised the underlying instrument three days ago. With the scoring noise reduced, running a 40-page controlled test became worthwhile, because variation in the result is now signal rather than measurement drift.

What changed

The Decoy Study is live at contentgrapher.io/research/decoy-study. Forty third-party pages were each rendered in three versions (original, treatment, decoy), then run through a standard retrieval pipeline with 320 questions written by two AI personas blind to the page contents.

Lift in AI retrieval over the original page, treatment vs decoy, split by gap count. On pages with 0 to 2 flagged gaps both arms lifted retrieval by 1.7 percentage points. On 3 to 4 gap pages treatment lifted 14.2pp vs decoy 12.5pp. On 5 to 8 gap pages treatment lifted 17.5pp vs decoy 6.2pp, an 11.2pp gap.

The headline finding is the split. On pages with five or more flagged gaps, the analyzer's specific picks lifted retrieval by 11.2 percentage points more than the decoy. On pages with two or fewer, the two arms moved retrieval by the same amount. This makes the gap count itself a signal: when ContentGrapher flags a lot of gaps, which gap you fill matters; when it flags few, the page is already close to complete and any structural addition helps about the same.

The study page links to the full methodology, the per-page data, and a NotebookLM demo. One page that did not fit the pattern is also reported in full.