The data
Per language, averaged over 12 topics
The reported completeness score, the same score with the score’s name-check neutralised, the independent reviewer accuracy, and the share of concepts the analysis named in English.
“Model saw” is the share of concepts the analysis judged well-connected; “Score counted” is the share that survived the name-check. The gap between them is the broken step. Round-trip control (English → Japanese → English): score gap +0.01 [−0.03, 0.05]. n = 12 topics, three runs each.
Per topic, reported score by language
Each row is one page, scored in all five languages on identical content. The non-English score is lower in nearly every cell; the few exceptions (CBT, one Japanese run) show the effect is a strong tendency, not a fixed penalty.
The DNA row in German (0.08) is an extreme case of the same mechanism plus a one-off analysis hiccup on that run; the pattern is consistent across the rest. Tiers are general, professional, and expert.