Lexical Diversity

AssessmentLanguage AnalysisLDvocabulary diversityvocabulary range

The range of distinct words a writer or speaker uses in a stretch of language. Operationalised as the relationship between types (unique word forms) and tokens (total running words). High diversity indicates a wide active vocabulary; low diversity indicates lexical repetition. The construct sits inside the broader notion of lexical richness, alongside density (content vs grammatical words) and sophistication (proportion of advanced vocabulary).

The deceptive part of the construct is that the obvious operationalisation — divide types by tokens — does not work. The type-token ratio falls as text length grows, because every text exhausts its productive vocabulary and starts repeating function words. A 100-word text and a 10,000-word text by the same writer cannot be compared on raw TTR. Decades of lexical-diversity research consist of attempts to fix this length sensitivity.

The four families of measures

Length-controlled means. Compute TTR over fixed windows and average. The Moving Average TTR (MATTR, Covington & McFall 2010) walks a sliding window across the text. Robust to length but throws away most of the text's information.

Length-corrected formulas. Transform raw TTR with a length-normalising function: Guiraud's index (types / √tokens), Herdan's C, Carroll's CTTR. Older approaches; partially length-stable but still drift on very long or very short texts.

Probabilistic curve-fits. Model how TTR decays with text length and report a parameter of the curve. vocd-D (Malvern & Richards 2002) fits a hypergeometric curve to a TTR-vs-length sample and outputs the D parameter. Clever in principle, sensitive to the random-sampling routine in practice.

Sequence-based. Walk the text and look for the point at which lexical repetition begins to dominate. MTLD (McCarthy 2005) reports the mean length of word strings that maintain a TTR above 0.72 before resetting. The most length-stable of the major indices in the McCarthy and Jarvis (2010) validation.

Why no single index wins

McCarthy and Jarvis's 2010 validation study tested every major diversity index against four validity criteria — convergent, divergent, internal, and incremental — and concluded that no single measure is sufficient. They recommend reporting both a sequence-based measure (MTLD) and a probabilistic one (HD-D), since the two capture different facets of diversity that correlate imperfectly. Text Inspector follows this advice: it reports MTLD and vocd-D side by side and refuses to give a single diversity number.

For CEFR alignment work and IELTS Writing assessment, the practical implication is that diversity scores are bandable but not rankable across very different text lengths. A C1 essay of 250 words and a C2 essay of 400 words can be compared on MTLD but should not be compared on raw TTR.

Use in test development and learner assessment

Diversity matters in three places. In rating reliability, diversity scores are weak predictors of holistic writing quality on their own but strong predictors when combined with sophistication and accuracy measures. In learner-corpus research, longitudinal diversity gains are one of the few stable computational signals of vocabulary growth across CEFR levels. In materials design, target diversity bands give item writers a calibration target for graded readers and listening transcripts so the input does not silently drift up or down a level.

The McCarthy and Jarvis findings also reframe lexical diversity for AI-generated text. Generated prose tends to score higher than human prose on raw TTR while scoring lower on MTLD, because generators repeat function-word patterns over long spans even as they avoid surface repetition. Diversity assessment is therefore one of the cheaper signals in AI-text detection, though never sufficient alone.

References

Malvern, D., Richards, B., Chipere, N. & Durán, P. (2004). Lexical Diversity and Language Development: Quantification and Assessment. Palgrave Macmillan.
McCarthy, P. M. (2005). An Assessment of the Range and Usefulness of Lexical Diversity Measures and the Potential of the Measure of Textual, Lexical Diversity (MTLD). PhD dissertation, University of Memphis.
McCarthy, P. M. & Jarvis, S. (2010). MTLD, vocd-D, and HD-D: A validation study of sophisticated approaches to lexical diversity assessment. Behavior Research Methods, 42(2), 381–392.
Covington, M. A. & McFall, J. D. (2010). Cutting the Gordian knot: The moving-average type-token ratio (MATTR). Journal of Quantitative Linguistics, 17(2), 94–100.
Jarvis, S. (2013). Capturing the diversity in lexical diversity. Language Learning, 63(s1), 87–106.

Related Terms

Lexical Diversity

AssessmentLanguage AnalysisLDvocabulary diversityvocabulary range

The four families of measures

Why no single index wins

Use in test development and learner assessment

References

Malvern, D., Richards, B., Chipere, N. & Durán, P. (2004). Lexical Diversity and Language Development: Quantification and Assessment. Palgrave Macmillan.
McCarthy, P. M. (2005). An Assessment of the Range and Usefulness of Lexical Diversity Measures and the Potential of the Measure of Textual, Lexical Diversity (MTLD). PhD dissertation, University of Memphis.
McCarthy, P. M. & Jarvis, S. (2010). MTLD, vocd-D, and HD-D: A validation study of sophisticated approaches to lexical diversity assessment. Behavior Research Methods, 42(2), 381–392.
Covington, M. A. & McFall, J. D. (2010). Cutting the Gordian knot: The moving-average type-token ratio (MATTR). Journal of Quantitative Linguistics, 17(2), 94–100.
Jarvis, S. (2013). Capturing the diversity in lexical diversity. Language Learning, 63(s1), 87–106.

Lexical Diversity

The four families of measures

Why no single index wins

Use in test development and learner assessment

References

See Also

Related Terms

Lexical Diversity

The four families of measures

Why no single index wins

Use in test development and learner assessment

References

See Also

Related Terms