ContentGrapher
ContentGrapher
research/audience-study/data
The audience studyJune 2026n = 60 pages

Data

The full numbers, with their counts. Read the methodology for how each was measured.

Coverage and determinism

Two facts frame everything else. The audience does not change which concepts are recommended, and the tool does not change its own answer from run to run.

MeasureValueReading
Concept-set overlap, correct audience vs. none0.994Same concepts recommended either way
Run-to-run concept overlap, same input twice0.996Effectively deterministic; nothing for an audience to stabilize

Priority change by condition

The effect lives in the priority tiers. Each figure is the mean share, across 60 pages, of shared concepts whose tier (essential, important, useful) differs between the two conditions.

ComparisonChangeReading
Run the same input twice, no audience5.9%The noise floor: how much priorities move on their own
Wrong role (recruiter for a recipe)7.5%Barely above the floor
Correct audience8.9%The real effect
Wrong level (flip the knowledge level)10.1%Moves the most of the vs-baseline set
Isolate the level (change only level)11.6%From the correct audience, change one field
Isolate the role (change only role)7.7%From the correct audience, change one field

The tests

Wilcoxon signed-rank, paired by page, n = 60.

TestComparisonStatisticReading
Audience beats noiseCorrect audience vs. run-to-run floorz = −2.27, p = 0.023Significant. The audience moves priorities beyond the tool’s own wobble.
Role correctnessCorrect role vs. wrong role (both vs. A)z = −0.81, p = 0.42Not significant. A correct role reshuffles no more than a wrong one.
Level vs. roleChange only level vs. change only rolez = −1.94, p = 0.053Borderline. The level tends to move more than the role.

Direction of the change

When the correct audience moves a concept's tier, it more often moves it down than up: 28 promotions against 44 demotions across the corpus, a net of −0.27 per page. The pull is strongest on beginner pages, where a deterministic depth filter demotes a few foundational concepts on its own. That filter is the reason to read the beginner row below as part model, part rule.

By reader stratum and level

The audience effect is largest on general-population content and smallest on expert content, the opposite of what we expected, though the expert and advanced slices are four pages each and carry little weight.

StratumPagesPriority change, B vs. A
General population3210.4%
Professional247.8%
Expert / clinical43.6%
LevelPagesChangeNet promotions
Beginner1512.2%−0.80
Intermediate418.2%−0.10
Advanced43.6%0.00

Negative net promotions mean the correct audience demoted more concepts than it promoted.

The judge panel

Three blind reviewer models, asked which of the no-audience and correct-audience recommendations served the reader better. The raw number is the cautionary one: it was a position artifact, and removing the bias collapses the lead to a coin flip.

MeasureValueDetail
Raw preference for the audience output63.8%30 of 47 decided
After a split-half position correction57.6%estimated from order
Counterbalanced (both orders, 6 votes/page)51.9%14 of 27 decided, 29 ties
Judges picking whichever set was shown second69%the position bias
Inter-judge agreement, Fleiss κ0.93 → 0.43raw → counterbalanced

Counterbalanced binomial p = 1.0. The fall in Fleiss κ from 0.93 to 0.43 shows the high agreement was largely agreement on position, not on content.

Reproduction

The corpus and its repair provenance, the pre-classifications, all 480 second-phase outputs, the embeddings, the per-page priority analysis, the counterbalanced judge verdicts, and the aggregate statistics are persisted as JSON in the project repository, alongside the runner that produced them.

← Back to the studyMethodology →All research