When the tool says a concept is missing, is it really, and does adding it help the page get retrieved?

Our Depth Necessity Study found that fleshing out a concept a page already mentions thinly rarely helps it get retrieved, and ended on a bigger question: maybe closing a genuine absence is what really matters, while deepening what is already there buys little. This study put that to the test. The answer has two parts. There is no “presence premium” in the concepts the score flags, because most of those flags are not real gaps: the page can already answer about six in ten of them. But when we engineer a genuine gap and close it, retrievability jumps. Presence matters; the flags just rarely point at a true absence.

The findingAdding a concept our tool flags as “missing” helps a page answer its own questions no more than deepening one it already mentions, because roughly 62% of those flagged-missing concepts are already answerable from the page. A controlled experiment, where we remove a concept the page genuinely relied on and add it back, shows presence does matter once the gap is real. So the lever for the score is the precision of its concept flags, not the balance between presence and depth.

The question, and why it matters

The coverage score blends two things in equal measure: whether a page covers the right concepts at all (presence), and how fully it develops them (depth). If closing a real absence helps retrieval far more than deepening a present concept, the score would be over-rewarding depth, and we would want to reweight it, a high-stakes change that re-bands the whole tool. So before touching the score, we measured the actual retrieval payoff of three moves on the same real pages: adding a flagged-missing concept, deepening a flagged-thin one, and, as a placebo, padding one already well developed.

How we tested it

We took 45 real explanatory pages and, for every concept the tool flags, asked a simple retrieval question: take a real query a reader with this page’s intent would ask, and see whether a model can answer it from the page as written, then whether adding that concept’s developed content changes the answer. A concept is retrieval-necessary if adding it turns a failing answer into a sufficient one. This is on-page answerability, free of the brand-and-authority noise that dominates rankings and citations. A neutral model wrote the queries and answers; a cross-family panel of three judges from three different makers scored them; and a known-answer control battery confirmed the measure works (it caught every obvious lift, 8 of 8, and ignored every irrelevant addition, 16 of 16).

Finding 1: most “missing” concepts are not missing

The surprise was in the flags themselves. When the tool marks a concept as missing from a page, the page can usually already answer a question about it, just through related wording rather than by naming the concept. That holds for about 62% of flagged-missing concepts, and for a similar share of the thin and well-developed ones. The label tracks the words on the page, not whether the page can answer the reader.

How often the page could already answer a flagged concept’s own question

Flagged “missing”

alreadyPresent: no

62%

Flagged “thin”

alreadyPresent: partial

57%

Already developed

alreadyPresent: yes

74%

Nearly two-thirds of the concepts the tool flags as missingare already answerable straight from the page, which covers them through related wording without naming the concept. The “missing” label is a weak signal of a real retrieval gap.

Finding 2: no premium in the flags

Because of that, developing a flagged concept barely moves the answer, no matter its state. Adding a flagged-missing concept helped about 1 in 6 times, deepening a thin one about 1 in 10, padding a developed one about 1 in 12. All low, and the gap between adding and deepening was a statistically unreliable 7 points. On its own flags, the score shows no presence premium: there is no retrieval case here for reweighting it toward presence and away from depth.

How often developing a concept improved the page’s own-question answer

Flagged “missing”

add the concept

17%

Flagged “thin”

deepen the mention

10%

Already developed

redundant padding

8%

A genuinely removed concept

controlled real gap

37%

Developing a flagged concept barely moves the page’s answer, whether it is flagged missing, thin, or already developed: all under one in five. But when we forcibly remove a concept the page relied on, creating a real gap, adding it back helps far more often, and 82% of the time when the removal actually broke the answer. Presence matters; the flags just rarely point at a real absence.

Finding 3: presence does matter, once the gap is real

To separate “presence does not matter” from “the flags are noisy,” we ran a controlled experiment. We took concepts a page demonstrably answered, surgically removed each one, and measured adding it back. Now the gap is guaranteed real. Here adding the concept back helped 37% of the time, against 10% for deepening a thin concept, a 27-point gap that holds up. And when the removal actually broke the page’s answer, adding the concept back restored it 82% of the time. Closing a real absence is a strong retrieval lever. The catch is that removing a concept the page covers only opened a real gap about 45% of the time; the rest, the page answered anyway through other content.

The pre-registered tests, and how they landed

Does adding a flagged-missing concept beat deepening a thin one? (need a +25pt gap)

Result: +7pt [−8, +15], crosses zero. NO PREMIUM.

Controlled real gap: remove a concept the page relied on, then add it back, vs deepening a thin one. (the decisive test)

Result: +27pt [+22, +49], holds. PRESENCE MATTERS FOR REAL GAPS.

Does the instrument credit redundant padding of an already-developed concept? (placebo, ceiling 12%)

Result: 8%. NO, IT BEHAVES.

Does the instrument detect lift when it is obviously there? (known-answer control)

Result: 8/8 flip, 16/16 stay flat. SOUND.

The premium the score assumes is not there in its own flags, because the flags rarely mark a real gap. The controlled test shows the lever is real once the gap is; the placebo and known-answer controls confirm the measure is trustworthy.

What this means for the score

The tempting conclusion from the first two findings, that depth is over-rewarded and presence under-rewarded, is wrong. Presence is a strong lever when a gap is genuine. The real issue is upstream: the score’s “missing” flags rarely mark a genuine gap, so rewarding a page for closing them buys little. The improvement worth chasing is not the presence-versus-depth weighting; it is making the concept check track what a page can actually answer, not just which words it contains. We are treating that as a motivated direction for the score, gated on its own validation, not a change this study licenses by itself.

What this study does not claim

01It does not measure citations or rankings. Those are downstream and dominated by domain authority and brand. This is on-page answerability only.
02It does not license a change to the coverage score. It motivates improving the precision of the concept flags; any change to the score itself needs its own human-anchored validation.
03The natural “missing” arm is small and noisy. Genuine absences are rare on real pages, so the decisive presence result rests on the controlled leave-one-out experiment, not the in-the-wild flags.
04Absolute necessity rates read low across the board, as in the Depth Necessity Study. We treat them as a conservative, relative signal validated in direction by the control battery, not as calibrated probabilities.
05The corpus is 45 English explanatory pages. Commercial, non-English, and reference pages are out of scope.

Methodology →The data →All research