Why does the same concept sometimes get a different boundary classification between runs?

Concepts clearly within scope land consistently across runs; so do concepts clearly out of scope. Concepts near the scope threshold can land on either side. This is expected, and the boundary trigger is the most stable signal to read.

The short answer

Concepts with a clear structural fit on the page land consistently across runs. So do concepts that are clearly out of scope. The concepts that can land differently between runs are the borderline ones: those sitting near the threshold between core, supportive, adjacent, and excluded.

When this happens, it is not a fault in the analysis. It reflects a genuine architectural ambiguity in how the concept relates to the page. The boundary trigger, which explains the structural reason for the classification, is more stable than the label itself and is the right thing to read first. See Boundary Classifications for the four classes and how triggers work.

What stays stable and what can vary

Two parts of the classification are stable across runs on the same page. First, concepts clearly within the page's scope hold their classification. Second, concepts clearly outside the page's scope also hold their classification. The structural reasons attached to each classification, the boundary triggers, tend to hold even when the label moves.

One part can vary across runs: which specific concepts near the scope threshold get flagged. The overall share of concepts the analysis flags as belonging elsewhere on a given page tends to be steady. The exact list of borderline concepts that make up that share is less so.

In practice this means a borderline concept may be classified as supportive in one run and adjacent in the next, while the rest of your concept list looks the same. The classification on your page is responding to a structural question that is genuinely close to the line, not to randomness in the system.

What the studies found

ContentGrapher has published two studies that document this pattern directly. The Findability Study measured how stable the "belongs elsewhere" recommendation list is when the same page is analyzed twice. The two lists overlap by about 60% on average. The study set a target of 70% before running and the system came in under it. The study was published with that result rather than withheld. These figures are from the study's corpus; the specific overlap on your page will depend on your content and how many concepts sit near the scope threshold.

The Agreement Study ran the same question across eight AI models from six makers, reading 49 real pages twice each. Two findings from that study are relevant here. The first: each model's overall rate of flagging concepts as belonging elsewhere held steady across its two passes. The second: the specific list of concepts each model flagged overlapped only 20% to 43% between the two passes, depending on the model. These figures describe the study's 49-page corpus and eight-model panel; the pattern on your page will not produce identical numbers, but the underlying behavior is the same.

Both studies point to the same pattern: the rate at which a page produces "belongs elsewhere" calls is a stable signal. The specific list of borderline concepts that make up that rate is not. Both studies also found that the strongest signals were the ones that repeated, and treated those as the recommendations to act on.

What to do when a classification changed between runs

First, read the boundary trigger on both runs, not just the label. The trigger states the specific structural reason for the classification. If the trigger reason changed between runs, the classification is responding to a different structural cue and is worth investigating on its merits. If the trigger reason held and only the label moved between two adjacent classes, the concept is genuinely borderline and either verdict is defensible. The decision in that case is yours: keep it, link out, or shorten it.

Second, compare the two boundary classifications directly in the analysis output for each run. Read the concept lists side by side and look for which concepts hold their classification and which moved. The classifications you see flip between runs are the borderline ones; the ones that held are your stable structural picture. See Boundary Classifications for what each classification means and what action it implies.

Third, if the variance is concentrated on a single concept that matters to you, edit the page to make its role on the page less ambiguous. Strengthen its integration if it should be core or supportive. Reduce its prominence if it should be adjacent. Then re-analyze. A clearer structural position produces a more decisive classification.

The same pattern shows up at the page level

The split-or-keep recommendation behaves the same way as the borderline-concept list. The summary verdict can vary between runs while the underlying per-entity architectural decisions stay stable. See why the structural recommendation changes between runs for how to read that output. The principle is the same: read the stable, structural detail underneath the verdict before you read the verdict.