ImprovedJune 8, 2026

Coverage Score Is Now Depth-Sensitive

You can now trust that a higher coverage score means deeper, more useful content — not just more of it. The scoring instrument was rebuilt to separate shallow mentions from substantive coverage, giving you a signal that moves when your writing actually improves.

Problem

The coverage score wasn't sensitive enough to detect real improvements. When you re-analysed a page after revising it, the score could move by ±10 points just from natural variation in the AI judgment — more than the average gain from a genuine content improvement. The number changed even when nothing changed, which made it impossible to tell whether a revision had worked.

Context

Internal measurement across 14 test pages revealed that the old scoring system produced only 2 distinct median values. Every page landed in one of two rough buckets. That's not enough resolution to show whether content is improving or just different. The root issue was a binary underlying model: a question was either "covered" or "not covered," with no distinction for depth. A single passing sentence counted the same as a thorough explanation.

Why now

The scoring formula was the last major instrument producing noisy output. Other pipeline fields had already been stabilised. With those in place, rebuilding coverage score was the natural next step — and the corpus data made a clear case that the old formula couldn't support the kind of before/after validation the tool exists to provide.

What changed

Coverage scoring now requires genuine depth, not just presence. A question is only counted as covered if the content actually explains it — a passing mention no longer qualifies. The same 14 pages that produced 2 distinct score values before now produce 14. The instrument can now tell a thin FAQ from a deep technical guide.

Before and after: coverage matrix for the same page under v2 and v3 scoring. v2 shows 71% with 6 of 8 questions marked Explained. v3 shows 58% with 4 of 8 Explained and 4 Missing, because two questions that were counted as covered under v2 didn't meet the depth threshold.

The example above shows the same page under both scoring versions. v2 called 6 of 8 questions covered and scored the page 71%. v3 found that two of those — Depends on and Who interacts — were only mentioned in passing. They flip to Missing, and the score drops to 58%. Same content, more honest measurement.

Score distribution across 6 pages: v2 scores cluster between 68–75% (range 7pp). v3 scores spread from 41–91% (range 50pp). Same pages, completely different resolution.

Across a set of pages, the difference is even clearer. v2 compressed everything into a 7-point band — noise larger than signal. v3 spreads the same pages across a 50-point range. Pages with genuinely thin coverage now score in the 40s; pages with thorough, well-structured explanations reach the 80s and 90s.

Scores will look lower than before on most pages. This is expected — v2 was inflated by counting mentions as coverage. A score of 65% under v3 reflects a more honest read of what a retrieval system would actually find useful. If you re-analyse a page that was last analysed before 8 June 2026, the numbers are not directly comparable; the dashboard will flag this when a series crosses the scoring version boundary.