The page type studyData

The numbers

Per-type summary

Across all 25 topics: the mean coverage score by type, the share of pages a cross-family judge panel flagged for at least one type-inappropriate missing concept, and the within-page repeat-run overlap (the noise floor) for each type.

Page type	Mean coverage	Inappropriate-flag pages	Re-run overlap
Product	0.41	72%	0.99
Category	0.57	72%	0.98
Blog	0.55	32%	0.97
Landing	0.38	96%	1.00

Product minus blog on the inappropriate-flag rate is +40 percentage points (bootstrap 95% CI 16 to 60). The flag rate is on the present-but-underdeveloped concept set; the strict absent-only set was empty for product (0 of 25) and near-empty elsewhere. This is the directional result, not a calibrated rate; see the calibration note below.

Calibration: judge panel vs decorrelated reference panel

Concept-level “inappropriate” rate by type on the shared 130-concept calibration sample. The two panels rank the types the same way (blog lowest, commercial higher), but the reference panel draws the line far more strictly, which is why overall agreement was 60%, below the 75% bar.

Page type	Judge panel	Reference panel
Landing	19%	87%
Category	29%	55%
Product	14%	41%
Blog	15%	18%

Judge panel: Gemini 2.5 Flash, DeepSeek V4 Pro, GPT-4.1-mini. Reference panel: Kimi K2.6, Qwen 3.5, Mistral Large. Overall agreement 60% (Wilson 95% CI 0.51 to 0.68).

Per topic

Each row is one topic. The four columns are the coverage score for that topic rendered as each page type. “Overlap” is the cross-type expected-concept overlap (Jaccard) for the topic; against a re-run floor near 0.99, these near-zero values are the core divergence result. “Band” marks whether the four types crossed at least one published score band.

Topic	Band	Prod	Cat	Blog	Land	Overlap	Crossed
standing desk	physical	0.53	0.55	0.40	0.34	0.014	yes
espresso machine	physical	0.23	0.15	0.57	0.03	0.000	yes
robot vacuum	physical	0.51	0.47	0.30	0.52	0.018	yes
office chair	physical	0.42	0.25	0.65	0.53	0.013	yes
mechanical keyboard	physical	0.59	0.30	0.59	0.50	0.012	yes
air purifier	physical	0.45	0.45	0.57	0.33	0.021	yes
electric kettle	physical	0.38	0.37	0.65	0.53	0.047	yes
running shoes	physical	0.34	0.40	0.76	0.73	0.019	yes
mattress	physical	0.38	0.63	0.54	0.26	0.012	yes
blender	physical	0.43	0.12	0.53	0.25	0.006	yes
project management tool	saas	0.53	0.86	0.32	0.29	0.029	yes
vpn	saas	0.38	0.78	0.68	0.23	0.032	yes
password manager	saas	0.45	0.88	0.54	0.74	0.026	yes
email marketing software	saas	0.49	0.65	0.59	0.19	0.038	yes
crm software	saas	0.30	0.56	0.52	0.12	0.040	yes
web hosting	saas	0.38	0.80	0.69	0.22	0.037	yes
video conferencing software	saas	0.34	0.75	0.54	0.53	0.028	yes
accounting software	saas	0.32	0.70	0.52	0.39	0.075	yes
website builder	saas	0.42	0.84	0.50	0.35	0.040	yes
online tax preparation	service	0.50	0.70	0.72	0.43	0.007	yes
online language tutoring	service	0.33	0.49	0.51	0.47	0.006	yes
car insurance	service	0.48	0.77	0.57	0.37	0.059	yes
meal kit delivery	service	0.33	0.72	0.52	0.46	0.032	yes
home cleaning service	service	0.38	0.67	0.58	0.27	0.006	yes
personal training	service	0.37	0.37	0.32	0.37	0.020	yes

The cross-type overlap is at or near zero on every topic: the expected-concept set the tool infers for a product page and for a blog about the same subject share almost no concepts, while a single page re-runs to near-identical sets. All 25 topics crossed at least one score band across their four types. The structural split/keep decision flipped across types on 8 of 25 topics (exploratory).

← Overview Methodology →