Why does the structural recommendation change between runs?
The split or keep recommendation is a weighted judgment, not a deterministic calculation. The boundary map, the underlying architectural data, is stable even when the verdict varies.
Structural decisions are judgment calls, not calculations
The split or keep recommendation from Phase 2c is an evaluated recommendation based on architectural signals in your content. Different runs can reach different conclusions on the same page. This is not a bug. The structural decision is the highest-variance output ContentGrapher produces. Phase 2, which includes the structural recommendation, requires an analysis credit. Phase 1 is free.
The pipeline frames this as its assessed recommendation, not a deterministic answer. The signals it weighs are the same between runs, but the weighted judgment that results can land on different sides of a threshold.
The variance is a signal about your page
In a 30-run controlled study, no condition produced unanimous agreement across five runs. Even the most clearly-framed audience conditions showed at least one run in the opposite direction. The structural decision reflects a judgment call on architecture that is genuinely near a threshold for many pages.
If the verdict across your runs is 4 in one direction and 1 in the other, the page is near a decision threshold. Both answers are technically defensible given the architecture signals. The divergence tells you something about your page's structural position, not about the reliability of the pipeline.
What stays stable
The boundary map, the per-entity architectural decisions showing which concepts to keep, shorten, move, or create, was constant across all 30 runs in the study: 15 boundary decisions in every case. The structural verdict, split or keep, varies. The underlying architectural data does not.
Use the boundary map as your authoritative architectural input. It tells you which concepts belong on this page, which should be shortened, which should move to a separate page, and which need to be created. This data is stable even when the summary verdict is not.
How to use this in practice
Read the reasoning provided alongside each structural decision, not just the label. If the reasoning is consistent across runs, that consistency is the signal worth acting on, even if the label flips between split and keep. If the reasoning also shifts between runs, the page's architecture is genuinely near a threshold and the structural decision is yours to make.
The boundary map and the supporting reasons give you the architectural picture. The split or keep label is the pipeline's best inference from that picture. They are different outputs and should be used differently.