Does telling the tool who is reading change what it says?
ContentGrapher asks who the reader is before it analyzes a page: a role, a knowledge level, an optional task. This is the study of whether that input earns its place.
The short answer: it moves the priority emphasis, a little. It does not change which concepts the tool recommends. The knowledge level does almost all of the work, and the role barely matters, a wrong role disrupts the ranking nearly as much as the right one sharpens it. When we asked a blind panel which version was better, it could not tell them apart. Fill in the level. The role is optional.
We ran 60 pages through four versions of the audience input and measured what moved. This page is the report on a feature that earned a smaller place than we expected, and on a position bias that almost sold us a result that was not there.
What we were testing
Before ContentGrapher builds its recommendations, it can take an audience: a reader role like first home buyer or general practitioner, a knowledge level from beginner to advanced, and an optional task. We held the first half of the analysis fixed for each page and changed only the audience handed to the second half, the part that decides which concepts matter and how much.
Four versions per page. No audience, where the tool infers the reader from the page. The correct audience, assigned ahead of time by a separate classifier. And two deliberate wrongs: the right role with the level flipped, and the right level with the role swapped for an unrelated one. The two wrongs are the point of the design, they let us separate what the level is doing from what the role is doing. The full setup is in the methodology.
It changes the emphasis, not the coverage
The first thing to measure is whether the audience changes which concepts the tool recommends. It does not. With or without an audience, the recommended set is the same page to page, the overlap is 99%. What the audience changes is the ranking: whether a concept is marked essential, important, or useful. And it changes about 9% of those tiers, against a floor of 6% that the tool produces just by running twice.
Across 60 pages, the set of concepts the tool recommends is the same whether or not you fill in the audience. The audience only re-ranks what is already there.
So the audience is a dial on emphasis, not a filter on content. This matches what the product already tells users: the boundary decisions, what belongs on the page and what does not, do not move when you change the reader. Only the priority order does.
The level does the work, the role is context
If the audience only nudges priorities, the next question is which part of it does the nudging. This is why the two wrong versions exist. Swapping the role for an unrelated one, a recruiter for a recipe, changes 7.5% of the priorities. Giving the correct audience changes 8.9%. Those are close enough that the role's accuracy is adding almost nothing, a wrong role reshuffles the ranking nearly as much as the right one does.
Everything sits close to the noise floor. A wrong role (7.5%) reshuffles almost as much as the correct audience (8.9%), so the role's accuracy adds little. Flipping the level moves the most.
Hold the audience correct and change one field at a time, and the split is clear. Flipping the level moves half again as many priorities as swapping the role. The level is the field carrying the signal; the role reads more like context the model notices than a control it obeys.
The level is the load-bearing field. Flipping it moves half again as many priorities as swapping the role does (p=0.05). The role behaves more like context than a control.
We could not measure consistency
We set out to ask a second question: does giving the tool an audience make it steadier, less likely to shuffle its answer from one run to the next? We could not answer it, because there was nothing to steady. The analysis runs at a fixed temperature, so it is effectively deterministic: run the same page twice and the recommended concepts come back 99.6% the same. There is no run-to-run noise for the audience to reduce. A more exploratory setting might create some, and the audience might then act as a stabilizer, but that is a different study than this one.
A blind panel could not tell the difference
A 9% shuffle in priorities is real. The question that matters is whether it is an improvement. So we showed three models, from three different makers, the no-audience recommendations and the correct-audience recommendations for each page, blinded, and asked which set better served the reader. The first pass came back encouraging: the audience version won 64% of the decided calls.
Then we checked the order we had shown them in. The judges preferred whichever set appeared second on the page 69% of the time, a strong position bias, and our randomization had by chance shown the audience version second more often. So we ran every page again in both orders and counted across all of them, which cancels the bias by construction.
The judges picked whichever set was shown second 69% of the time. The raw lead was that position bias, not a real preference. Counterbalanced, it lands on a coin flip, and 29 of 56 pages were called a tie.
Counterbalanced, the preference is a coin flip: 52%, and 29 of 56 pages came back a tie. The judges that had looked so sure agreed with each other far less once they could not lean on position. The quality gain we thought we had measured was the order of the boxes, not the audience.
What we cannot claim
- 01This is not proof the audience input does nothing. It changes the priority ranking by a real, measurable amount. It just does not change which concepts are recommended, and a blind panel cannot grade the ranking change as better.
- 02The pipeline is deterministic, so the consistency question is untestable here, not answered. At a more exploratory setting the audience might act as a stabilizer. We did not test that setting.
- 03The expert and advanced slices are small, four pages each, because the most authoritative content is the hardest to fetch. The read that the effect is largest on general-population content is suggestive, not settled.
- 04Some of the priority change on beginner pages is a deterministic rule, not the model reasoning about the reader: a depth filter demotes a few foundational concepts when the audience is a beginner. The data page separates that out.
- 05The judges are models, not editors. They remove the bias of any single maker, but they share the blind spots models have with one another.
The answer
The audience survey earns a smaller place than we expected, and we would rather say that plainly than dress it up. The level field does measurable work, it sets the depth and moves the most priorities, so it stays, and it is the one field worth filling in. The role is closer to cosmetic: useful as context the model notices, not a control that changes the analysis, and a wrong one is no worse than none.
That lands almost exactly where the product already points. The role narrows the framing, the level sets the depth, and the boundary decisions hold steady underneath both. What this study adds is the honest size of the effect, and a caution we now apply everywhere: a blind panel of models carries a strong position bias, so show every comparison in both orders before you believe a margin. We almost shipped a 64% that was really a 52%.
This study sits next to the reliability study, which found a different mirage in the same place: a model lead that turned out to be noise. Both are about the same discipline, measuring whether a difference you can see is a difference that is there.