MTLD
The Measure of Textual Lexical Diversity, developed by Philip McCarthy in his 2005 University of Memphis dissertation and validated against competing indices in McCarthy and Jarvis (2010). MTLD is the most length-stable single measure of Lexical Diversity in the field's standard validation study, and is one of the two diversity indices that Text Inspector reports by default alongside vocd-D.
How it works
MTLD walks the text token by token and tracks the running TTR. As soon as the TTR drops to a fixed criterion threshold of 0.72, MTLD records that span as one factor, resets, and starts again. After the full text is consumed, it counts the factors. The score is the mean number of tokens per factor, with a partial-factor correction at the end of the text.
A high MTLD score means the writer maintains lexical variation over long spans before repetition forces the TTR below threshold. A low score means repetition kicks in quickly and factors close fast. The 0.72 threshold is empirically derived: McCarthy ran the analysis at thresholds across the plausible range and chose the value where the measure was most stable across text lengths.
The standard implementation runs the analysis twice — once forward, once backward — and averages, which controls for asymmetric repetition patterns at the start versus end of a text. Implementations that run forward only produce systematically different scores. This is one of the inter-tool variance sources catalogued in Text Metric Implementation Variance.
What McCarthy and Jarvis found
The 2010 validation study tested MTLD, vocd-D, HD-D, and several older indices against four validity criteria: convergent, divergent, internal, and incremental. MTLD was the only index that did not vary as a function of text length within the study's range. It correlated strongly with rater judgements of lexical sophistication while remaining largely independent of Lexical Density and other constructs it should not be capturing. The paper concludes that MTLD should be reported alongside HD-D, since the two indices capture different facets of diversity and the agreement between them is more diagnostic than either alone.
Typical ranges
MTLD scores are unbounded above but in practice cluster between roughly 50 and 200 for academic and journalistic prose. Texts well below 50 are heavily repetitive (controlled-language manuals, simple narratives); texts well above 200 are unusually rich (literary essays, expert academic writing). The IELTS Writing Task 2 model bands sit roughly in the 80–140 range; native-speaker academic prose typically scores 100–180.
These ranges are tool-dependent. Different tokenisers, lemmatisation choices, and forward-vs-bidirectional runs shift the absolute numbers by 5–15 points on identical text. The validation findings concern relative rankings, which are stable across implementations.
Use in ELT and test design
MTLD is the diversity index of choice for IELTS-style essay-length writing assessment because it does not penalise short texts the way TTR does and does not depend on the random-sampling routine that introduces noise into vocd-D. For learner-corpus tracking it is the most stable signal of vocabulary-range development across CEFR bands. For AI-generated text screening it is one of the few computational features where AI prose tends to score lower than human prose at matched length — a useful diagnostic, though never sufficient alone.
Key References
- McCarthy, P. M. (2005). An Assessment of the Range and Usefulness of Lexical Diversity Measures and the Potential of the Measure of Textual, Lexical Diversity (MTLD). PhD dissertation, University of Memphis.
- McCarthy, P. M. & Jarvis, S. (2010). MTLD, vocd-D, and HD-D: A validation study of sophisticated approaches to lexical diversity assessment. Behavior Research Methods, 42(2), 381–392.
- Koizumi, R. & In'nami, Y. (2012). Effects of text length on lexical diversity measures: Using short texts with less than 200 tokens. System, 40(4), 554–564.
See Also
- Lexical Diversity: the umbrella construct
- vocd-D / HD-D: the probabilistic family that McCarthy and Jarvis recommend reporting alongside MTLD
- Type-Token Ratio: what MTLD repairs the length-sensitivity of
- Text Metric Implementation Variance: forward-only versus bidirectional, lemmatisation choices, and the inter-tool drift they produce