Data
The full numbers, with their counts. Read the methodology for how each was measured.
Coverage and determinism
Two facts frame everything else. The audience does not change which concepts are recommended, and the tool does not change its own answer from run to run.
Priority change by condition
The effect lives in the priority tiers. Each figure is the mean share, across 60 pages, of shared concepts whose tier (essential, important, useful) differs between the two conditions.
The tests
Wilcoxon signed-rank, paired by page, n = 60.
Direction of the change
When the correct audience moves a concept's tier, it more often moves it down than up: 28 promotions against 44 demotions across the corpus, a net of −0.27 per page. The pull is strongest on beginner pages, where a deterministic depth filter demotes a few foundational concepts on its own. That filter is the reason to read the beginner row below as part model, part rule.
By reader stratum and level
The audience effect is largest on general-population content and smallest on expert content, the opposite of what we expected, though the expert and advanced slices are four pages each and carry little weight.
Negative net promotions mean the correct audience demoted more concepts than it promoted.
The judge panel
Three blind reviewer models, asked which of the no-audience and correct-audience recommendations served the reader better. The raw number is the cautionary one: it was a position artifact, and removing the bias collapses the lead to a coin flip.
Counterbalanced binomial p = 1.0. The fall in Fleiss κ from 0.93 to 0.43 shows the high agreement was largely agreement on position, not on content.
Reproduction
The corpus and its repair provenance, the pre-classifications, all 480 second-phase outputs, the embeddings, the per-page priority analysis, the counterbalanced judge verdicts, and the aggregate statistics are persisted as JSON in the project repository, alongside the runner that produced them.