The audience studyJune 2026n = 60 pages

Data

The full numbers, with their counts. Read the methodology for how each was measured.

Coverage and determinism

Two facts frame everything else. The audience does not change which concepts are recommended, and the tool does not change its own answer from run to run.

Measure	Value	Reading
Concept-set overlap, correct audience vs. none	0.994	Same concepts recommended either way
Run-to-run concept overlap, same input twice	0.996	Effectively deterministic; nothing for an audience to stabilize

Priority change by condition

The effect lives in the priority tiers. Each figure is the mean share, across 60 pages, of shared concepts whose tier (essential, important, useful) differs between the two conditions.

Comparison	Change	Reading
Run the same input twice, no audience	5.9%	The noise floor: how much priorities move on their own
Wrong role (recruiter for a recipe)	7.5%	Barely above the floor
Correct audience	8.9%	The real effect
Wrong level (flip the knowledge level)	10.1%	Moves the most of the vs-baseline set
Isolate the level (change only level)	11.6%	From the correct audience, change one field
Isolate the role (change only role)	7.7%	From the correct audience, change one field

The tests

Wilcoxon signed-rank, paired by page, n = 60.

Test	Comparison	Statistic	Reading
Audience beats noise	Correct audience vs. run-to-run floor	z = −2.27, p = 0.023	Significant. The audience moves priorities beyond the tool’s own wobble.
Role correctness	Correct role vs. wrong role (both vs. A)	z = −0.81, p = 0.42	Not significant. A correct role reshuffles no more than a wrong one.
Level vs. role	Change only level vs. change only role	z = −1.94, p = 0.053	Borderline. The level tends to move more than the role.

Direction of the change

When the correct audience moves a concept's tier, it more often moves it down than up: 28 promotions against 44 demotions across the corpus, a net of −0.27 per page. The pull is strongest on beginner pages, where a deterministic depth filter demotes a few foundational concepts on its own. That filter is the reason to read the beginner row below as part model, part rule.

By reader stratum and level

The audience effect is largest on general-population content and smallest on expert content, the opposite of what we expected, though the expert and advanced slices are four pages each and carry little weight.

Stratum	Pages	Priority change, B vs. A
General population	32	10.4%
Professional	24	7.8%
Expert / clinical	4	3.6%

Level	Pages	Change	Net promotions
Beginner	15	12.2%	−0.80
Intermediate	41	8.2%	−0.10
Advanced	4	3.6%	0.00

Negative net promotions mean the correct audience demoted more concepts than it promoted.

The judge panel

Three blind reviewer models, asked which of the no-audience and correct-audience recommendations served the reader better. The raw number is the cautionary one: it was a position artifact, and removing the bias collapses the lead to a coin flip.

Measure	Value	Detail
Raw preference for the audience output	63.8%	30 of 47 decided
After a split-half position correction	57.6%	estimated from order
Counterbalanced (both orders, 6 votes/page)	51.9%	14 of 27 decided, 29 ties
Judges picking whichever set was shown second	69%	the position bias
Inter-judge agreement, Fleiss κ	0.93 → 0.43	raw → counterbalanced

Counterbalanced binomial p = 1.0. The fall in Fleiss κ from 0.93 to 0.43 shows the high agreement was largely agreement on position, not on content.

Reproduction

The corpus and its repair provenance, the pre-classifications, all 480 second-phase outputs, the embeddings, the per-page priority analysis, the counterbalanced judge verdicts, and the aggregate statistics are persisted as JSON in the project repository, alongside the runner that produced them.

← Back to the study Methodology →All research