ContentGrapher
ContentGrapher
research/language-study
The language studyJune 2026

Can ContentGrapher read non-English content?

People run non-English pages through ContentGrapher, and the tool gives them a lower score. We wanted to know whether that lower score is real, or an artifact. So we ran the same explanatory content through it in five languages and had reviewers fluent in each language check the result. The answer is clear: the tool understands non-English content as well as it understands English. The score it currently reports does not reflect that, for a reason that is fixable.

The findingContentGrapher reads non-English content correctly. Reviewers fluent in French, German, Japanese and Korean rated its concept analysis as accurate as English or better. But the completeness score it reports drops by 0.15 to 0.23 for the same content in another language, because of one scoring step that checks for concept names in the page and does not account for the analysis naming those concepts in English. Today, treat non-English scores as unreliable; the underlying analysis is sound.

ContentGrapher was built and tuned on English. We have never measured what happens when its English-shaped analysis meets content in another language, so we could not tell users whether to trust it. This study answers that, and it separates two very different possibilities: a tool that genuinely cannot analyse other languages, and a tool that analyses them fine but reports a broken number.

The setup

We took 12 real explanatory pages across general, professional and expert topics, and produced a faithful translation of each into French, German, Japanese and Korean. Translation faithfulness was verified independently, and all versions passed. Then we ran every version through ContentGrapher and compared three things against the English original:

  1. AWhat users see. The completeness score the tool reports for each version.
  2. BWhat the model understood. The concepts and connections the analysis actually found, before the score is calculated.
  3. CAn independent check. Reviewers fluent in each language judging whether the tool found the right concepts for the native text.

What we found

Layers B and C agree, and they disagree with A. The analysis is right; the reported score is wrong.

Did the tool identify the right concepts? · independent reviewers, 0–3

English
2.0
French
2.1
German
2.1
Japanese
2.9
Korean
2.8

The completeness score the tool reported · 0–1

English
0.60
French
0.41
German
0.45
Japanese
0.37
Korean
0.37

Same content, every language. Reviewers fluent in each language judged the tool’s concept extraction as accurate as English or better, yet the score it reported fell by 0.15 to 0.23. The understanding held; the score did not. n = 12 topics, three reviewers per item.

The model understands the content

On the same content, the analysis identified well-connected concepts at the same rate or higher in every non-English language than in English. The independent reviewers agreed: concept-extraction accuracy was as good as English or better, and notably higher for Japanese and Korean. We lean on the outside panel rather than the tool’s own confidence because our Reliability Studyshowed that a tool’s self-assessment can diverge from an independent check. By every measure of understanding, the tool reads the content.

The score breaks on a name check

The drop traces to one step. When the analysis reads non-English text, it tends to name the concepts it finds in English: it reads a Japanese page about 光合成 and lists “photosynthesis,” “chlorophyll,” “the Calvin cycle.” A later scoring step then checks whether each concept’s name appears in the page text. The English names are not in the Japanese page, so the check fails and the concepts stop counting as covered. The more concepts get English names, the larger the drop, which is why French (69% English-named) and Japanese (74%) and Korean (77%) fall furthest, and German (42%, because its terms resemble English) falls least.

Concepts the tool labelled in English · by content language

English
17%
French
69%
German
42%
Japanese
74%
Korean
77%

Well-connected concepts: what the model saw vs what the score counted

Englishsaw 58% → counted 47%
Frenchsaw 65% → counted 18%
Germansaw 62% → counted 29%
Japanesesaw 68% → counted 16%
Koreansaw 68% → counted 11%

The model saw concepts as well-connected at the same rate in every language. The score then checks whether each concept’s name appears in the text. For non-English pages the names are in English, so the check fails and the concepts stop counting. The wider the English-label gap, the larger the drop.

It is the language, not the translation

We ran a control: take the English original, translate it to Japanese, translate it back to English, and score that. It scored the same as the original, within noise. So translation does not destroy structure. The drop appears only when the scored text is actually in another language, which is exactly what the name-check explanation predicts.

What this means

If you are analysing non-English content today, the structural read is trustworthy but the headline score is not: it understates the page. The gap is not a limit of the analysis, it is a scoring step that assumes the concept names it produces will appear in the page, which holds for English and breaks for everything else. That is a fixable problem, by matching concepts in the page’s own language or by naming them in that language.

What this study does not claim

  1. 01It is not a claim that the score is correct in English either. It only shows that the same content scores lower in another language while the analysis stays as good.
  2. 02It tests faithful translations of English explanatory pages, not content written natively in another language. Content that uses concepts with no English equivalent is a separate question.
  3. 03The independent reviewers were fluent AI readers from three different makers, not native human reviewers. We verified the translations were faithful rather than grading native completeness.
  4. 04It covers explanatory pages only. Guides, comparisons and transactional pages are out of scope.
  5. 05It is a point-in-time test of the current analysis. The naming behaviour and the scoring step can both change.
Methodology →The data →All research