The findability studyData

The full data

Every table behind the study, including all 30 per-domain results, the classifier stability runs, and the complete results of the unpublished hardening run the study is built on.

Headline: all 30 pages, six conditions

Condition	Routing @0.6	Findability @0.6 (cosine)	Findability (judge)
Treatment	83.1%	84.2%	68.3%
Treatment-narrow	83.1%	84.2%	67.5%
Decoy	0.0%	3.9%	16.9%
Addition-only	83.1%	84.2%	68.3%
Random	0.0%	6.7%	17.5%
Source-only (before)	0.0%	6.4%	18.1%

Treatment

Routing @0.6

83.1%

Findability @0.6 (cosine)

84.2%

Findability (judge)

68.3%

Treatment-narrow

Routing @0.6

83.1%

Findability @0.6 (cosine)

84.2%

Findability (judge)

67.5%

Decoy

Routing @0.6

0.0%

Findability @0.6 (cosine)

3.9%

Findability (judge)

16.9%

Addition-only

Routing @0.6

83.1%

Findability @0.6 (cosine)

84.2%

Findability (judge)

68.3%

Random

Routing @0.6

0.0%

Findability @0.6 (cosine)

6.7%

Findability (judge)

17.5%

Source-only (before)

Routing @0.6

0.0%

Findability @0.6 (cosine)

6.4%

Findability (judge)

18.1%

Judge findability is the 3-temperature majority vote. The judge credits decoy hubs when residual source-page content partially answers a query, which is why its gap is narrower than the cosine gap. Both are reported; the claim uses cosine with the judge as corroboration.

Bootstrapped confidence intervals

10,000 resamples over per-source means. The pre-registered floor was a lower bound above 5pp.

Delta	Mean	95% CI
Routing delta (treatment minus decoy)	83.1pp	[75.8, 89.7]
Findability delta (treatment minus decoy, cosine)	80.3pp	[72.2, 88.1]
Treatment routing, absolute	83.1%	[75.6, 90.0]
Treatment findability, absolute	84.2%	[76.9, 90.8]

Routing delta (treatment minus decoy)

Mean

83.1pp

95% CI

[75.8, 89.7]

Findability delta (treatment minus decoy, cosine)

Mean

80.3pp

95% CI

[72.2, 88.1]

Treatment routing, absolute

Mean

83.1%

95% CI

[75.6, 90.0]

Treatment findability, absolute

Mean

84.2%

95% CI

[76.9, 90.8]

By retrieval role

Role	n	Routing @0.6	Findability @0.6	Decoy findability
explain	8	79.2%	79.2%	4.2%
convert	6	83.3%	83.3%	2.8%
guide	6	91.7%	97.2%	11.1%
compare	5	85.0%	85.0%	0.0%
evaluate	5	76.7%	76.7%	0.0%

explain

Routing @0.6

79.2%

Findability @0.6

79.2%

Decoy findability

4.2%

convert

Routing @0.6

83.3%

Findability @0.6

83.3%

Decoy findability

2.8%

guide

Routing @0.6

91.7%

Findability @0.6

97.2%

Decoy findability

11.1%

compare

Routing @0.6

85.0%

Findability @0.6

85.0%

Decoy findability

0.0%

evaluate

Routing @0.6

76.7%

Findability @0.6

76.7%

Decoy findability

0.0%

The two roles admitted under relaxed corpus gates (compare, evaluate) sit inside the range of the others. The weakest role, evaluate at 76.7%, still beats its decoy by 73pp.

All 30 domains

Per-source treatment results at the strictest threshold. Adjacent-share is the density of belongs-elsewhere material on the source page. Note that it does not predict accuracy: the two 3% sources both hit 100% while the 39% and 41% sources sit at 67%.

Domain	Role	Words	Adj-share	Routing	Findability
research.ibm.com	explain	1,790	22%	100%	100%
yoast.com	convert	1,729	16%	100%	100%
uniqode.com	convert	1,781	16%	100%	100%
intrepidtravel.com	explain	3,205	15%	100%	100%
veterinary.rossu.edu	guide	1,851	12%	67%	100%
business.adobe.combuyer	explain	2,669	12%	67%	67%
vuejs.org	explain	1,500	10%	100%	100%
docs.anthropic.combuyer	compare	1,246	39%	67%	67%
wpic.co	evaluate	1,343	14%	33%	33%
blog.hootsuite.combuyer	guide	3,485	6%	83%	83%
techpp.com	compare	2,319	9%	83%	83%
remote100k.com	evaluate	1,506	7%	100%	100%
aws.amazon.combuyer	convert	1,221	26%	83%	83%
semrush.combuyer	guide	3,123	5%	100%	100%
joist.com	compare	3,237	5%	75%	75%
workday.com	evaluate	1,520	3%	100%	100%
gov.uk	guide	1,272	12%	100%	100%
uplead.com	convert	1,331	10%	67%	67%
rapidseedbox.com	compare	5,015	3%	100%	100%
assetpanda.com	evaluate	2,294	5%	100%	100%
docs.aws.amazon.combuyer	explain	1,472	17%	50%	50%
vanquis.com	convert	1,811	7%	67%	67%
growthmarketing.studio	guide	1,787	6%	100%	100%
dataally.ai	compare	2,041	8%	100%	100%
teramind.co	evaluate	2,430	5%	50%	50%
zendesk.combuyer	explain	2,935	8%	50%	50%
buffer.combuyer	explain	1,331	5%	100%	100%
capitalworldgroup.com	explain	1,469	41%	67%	67%
oxfordpartners.com.au	convert	1,346	16%	83%	83%
nextjs.org	guide	1,370	13%	100%	100%

research.ibm.com

Role

explain

Words

1,790

Adj-share

22%

Routing

100%

Findability

100%

yoast.com

Role

convert

Words

1,729

Adj-share

16%

Routing

100%

Findability

100%

uniqode.com

Role

convert

Words

1,781

Adj-share

16%

Routing

100%

Findability

100%

intrepidtravel.com

Role

explain

Words

3,205

Adj-share

15%

Routing

100%

Findability

100%

veterinary.rossu.edu

Role

guide

Words

1,851

Adj-share

12%

Routing

67%

Findability

100%

business.adobe.combuyer

Role

explain

Words

2,669

Adj-share

12%

Routing

67%

Findability

67%

vuejs.org

Role

explain

Words

1,500

Adj-share

10%

Routing

100%

Findability

100%

docs.anthropic.combuyer

Role

compare

Words

1,246

Adj-share

39%

Routing

67%

Findability

67%

wpic.co

Role

evaluate

Words

1,343

Adj-share

14%

Routing

33%

Findability

33%

blog.hootsuite.combuyer

Role

guide

Words

3,485

Adj-share

Routing

83%

Findability

83%

techpp.com

Role

compare

Words

2,319

Adj-share

Routing

83%

Findability

83%

remote100k.com

Role

evaluate

Words

1,506

Adj-share

Routing

100%

Findability

100%

aws.amazon.combuyer

Role

convert

Words

1,221

Adj-share

26%

Routing

83%

Findability

83%

semrush.combuyer

Role

guide

Words

3,123

Adj-share

Routing

100%

Findability

100%

joist.com

Role

compare

Words

3,237

Adj-share

Routing

75%

Findability

75%

workday.com

Role

evaluate

Words

1,520

Adj-share

Routing

100%

Findability

100%

gov.uk

Role

guide

Words

1,272

Adj-share

12%

Routing

100%

Findability

100%

uplead.com

Role

convert

Words

1,331

Adj-share

10%

Routing

67%

Findability

67%

rapidseedbox.com

Role

compare

Words

5,015

Adj-share

Routing

100%

Findability

100%

assetpanda.com

Role

evaluate

Words

2,294

Adj-share

Routing

100%

Findability

100%

docs.aws.amazon.combuyer

Role

explain

Words

1,472

Adj-share

17%

Routing

50%

Findability

50%

vanquis.com

Role

convert

Words

1,811

Adj-share

Routing

67%

Findability

67%

growthmarketing.studio

Role

guide

Words

1,787

Adj-share

Routing

100%

Findability

100%

dataally.ai

Role

compare

Words

2,041

Adj-share

Routing

100%

Findability

100%

teramind.co

Role

evaluate

Words

2,430

Adj-share

Routing

50%

Findability

50%

zendesk.combuyer

Role

explain

Words

2,935

Adj-share

Routing

50%

Findability

50%

buffer.combuyer

Role

explain

Words

1,331

Adj-share

Routing

100%

Findability

100%

capitalworldgroup.com

Role

explain

Words

1,469

Adj-share

41%

Routing

67%

Findability

67%

oxfordpartners.com.au

Role

convert

Words

1,346

Adj-share

16%

Routing

83%

Findability

83%

nextjs.org

Role

guide

Words

1,370

Adj-share

13%

Routing

100%

Findability

100%

“buyer” marks the 8 sources admitted under the buyer-recognition gate. aws.amazon.com and docs.aws.amazon.com share a root domain by accepted exception; they are distinct properties with different roles.

Classifier stability runs

Mean Jaccard between rerun belongs-elsewhere sets	0.615 (floor: 0.7)
Sources below the floor	7 of 10
Best source (concept membership)	Jaccard 1.0 across all three pairs
Worst source	Jaccard 0.281
Destination-name (slug) stability	0.368
Top-3 destination set identical across reruns	2 of 10

Mean Jaccard between rerun belongs-elsewhere sets

0.615 (floor: 0.7)

Sources below the floor

7 of 10

Best source (concept membership)

Jaccard 1.0 across all three pairs

Worst source

Jaccard 0.281

Destination-name (slug) stability

0.368

Top-3 destination set identical across reruns

2 of 10

10 sources, 3 fresh classifier reruns each, identical inputs. The published claim attaches to the modal belongs-elsewhere list across reruns, per the pre-registered policy described in the methodology.

Embedding portability

Measurement re-run on the 10 cleanest-signal sources, destination prose unchanged, fresh embeddings and indexes per model. Floor: positive routing delta on at least 7 of 10 sources per model.

Model	Routing delta	Findability delta	Positive sources
text-embedding-3-large (baseline)	96.7pp	97.5pp	10/10
text-embedding-3-small	95.0pp	91.7pp	10/10
bge-m3 (open source, local)	98.3pp	59.2pp	9/10

text-embedding-3-large (baseline)

Routing delta

96.7pp

Findability delta

97.5pp

Positive sources

10/10

text-embedding-3-small

Routing delta

95.0pp

Findability delta

91.7pp

Positive sources

10/10

bge-m3 (open source, local)

Routing delta

98.3pp

Findability delta

59.2pp

Positive sources

9/10

voyage-3-large was specified and not run (no API key was provisioned). bge-m3's findability delta is compressed by the fixed 0.6 threshold being effectively stricter in its cosine space; its routing delta is the largest of the three.

Real-page sidecar (teramind.co)

Four source pages where the recommendation mapped to a page that already exists on the site. Each cell shows purpose-written destination / real existing page.

Source page	Queries	Routing @0.6	Findability @0.6
/blog/how-to-detect-shadow-ai	2	100% / 50%	100% / 100%
/blog/pros-and-cons-of-employee-monitoring	4	50% / 0%	50% / 0%
/blog/insider-threats	6	67% / 33%	100% / 83%
/blog/ai-usage-control	2	100% / 0%	100% / 0%
Mean	14	79% / 21%	88% / 46%

/blog/how-to-detect-shadow-ai

Queries

Routing @0.6

100% / 50%

Findability @0.6

100% / 100%

/blog/pros-and-cons-of-employee-monitoring

Queries

Routing @0.6

50% / 0%

Findability @0.6

50% / 0%

/blog/insider-threats

Queries

Routing @0.6

67% / 33%

Findability @0.6

100% / 83%

/blog/ai-usage-control

Queries

Routing @0.6

100% / 0%

Findability @0.6

100% / 0%

Mean

Queries

Routing @0.6

79% / 21%

Findability @0.6

88% / 46%

Illustrative only: 4 sources, 14 queries, operator-judged mappings. Where the real page genuinely covers the moved concept it reaches 83% to 100% findability; where the mapped concepts were product-specific, marketing-written pages lost to purpose-written explainers.

The unpublished hardening run, in full

The study's direct predecessor ran the same six-condition design on 10 sources and was held back from publication for the reasons documented in the methodology. Its complete headline table is published here for the record, because the published study's design claims only make sense against it.

Condition (n=10)	Routing @0.6	Findability @0.6 (cosine)	Findability (judge)
Before (source only)	n/a	10%	37%
Treatment	92%	95%	75%
Treatment-narrow	92%	95%	75%
Decoy	0%	12%	33%
Addition-only	92%	95%	77%
Random	0%	13%	35%

Before (source only)

Routing @0.6

n/a

Findability @0.6 (cosine)

10%

Findability (judge)

37%

Treatment

Routing @0.6

92%

Findability @0.6 (cosine)

95%

Findability (judge)

75%

Treatment-narrow

Routing @0.6

92%

Findability @0.6 (cosine)

95%

Findability (judge)

75%

Decoy

Routing @0.6

Findability @0.6 (cosine)

12%

Findability (judge)

33%

Addition-only

Routing @0.6

92%

Findability @0.6 (cosine)

95%

Findability (judge)

77%

Random

Routing @0.6

Findability @0.6 (cosine)

13%

Findability (judge)

35%

Supporting results from that run: sign test 58 of 58 non-tied wins (p = 3.47e-18); chunk-size sweep at 256, 512, and 1024 tokens identical on the tested source; judge internal consistency 94% 3-of-3; cross-model judge calibration 93% agreement on 30 borderline cases.

Hardening run vs the published study

Measure	Hardening run (unpublished)	The findability study
Sources	10	30, role-balanced
Routing accuracy @0.6 (treatment / decoy)	92% / 0%	83.1% / 0.0%
Findability @0.6, cosine (treatment / decoy)	95% / 12%	84.2% / 3.9%
Findability, judge (treatment / decoy)	75% / 33%	68.3% / 16.9%
Findability delta	+83pp (no CI at n=10)	+80.3pp, 95% CI [72.2, 88.1]
Sign test	58/58 wins, p = 3.47e-18	164/164 wins, p = 4.3e-50
Addition-only vs treatment	identical (removal decorative)	identical (reproduced)
Random arm vs decoy	indistinguishable	indistinguishable (reproduced)
Embedding models	1	3
Classifier stability	not measured	Jaccard 0.615, 7/10 below floor

Sources

Hardening run (unpublished)

The findability study

30, role-balanced

Routing accuracy @0.6 (treatment / decoy)

Hardening run (unpublished)

92% / 0%

The findability study

83.1% / 0.0%

Findability @0.6, cosine (treatment / decoy)

Hardening run (unpublished)

95% / 12%

The findability study

84.2% / 3.9%

Findability, judge (treatment / decoy)

Hardening run (unpublished)

75% / 33%

The findability study

68.3% / 16.9%

Findability delta

Hardening run (unpublished)

+83pp (no CI at n=10)

The findability study

+80.3pp, 95% CI [72.2, 88.1]

Sign test

Hardening run (unpublished)

58/58 wins, p = 3.47e-18

The findability study

164/164 wins, p = 4.3e-50

Addition-only vs treatment

Hardening run (unpublished)

identical (removal decorative)

The findability study

identical (reproduced)

Random arm vs decoy

Hardening run (unpublished)

indistinguishable

The findability study

indistinguishable (reproduced)

Embedding models

Hardening run (unpublished)

The findability study

Classifier stability

Hardening run (unpublished)

not measured

The findability study

Jaccard 0.615, 7/10 below floor

Reading the drop from 95% to 84%. The published study's absolute numbers are lower than the hardening run's because the corpus tripled and was forced to include the page types the original gates excluded. That is the point of the exercise: the hardening run's 95% was measured on a corpus selected for exactly the anatomy the signal works best on. The 84% with a confidence interval is the number we are willing to publish; every structural pattern (removal decorative, random arm at baseline, treatment-narrow indistinguishable) reproduced exactly at triple the sample.

← Back to the study Read the methodology →