98% Coverage Threshold
The 98% coverage threshold is the claim, originating with Hu and Nation (2000), that an L2 reader needs to know roughly 98% of the running word tokens in a text for adequate unassisted comprehension, meaning comprehension without a dictionary, glossary, or teacher mediation.
The Hu and Nation study
Hu and Nation manipulated four versions of a 673-word fiction text to give Chinese learners of English coverage rates of 100%, 95%, 90%, and 80%, swapping known words for invented nonsense words to control coverage precisely. Sixty-six learners read at one of the four levels and were tested with a multiple-choice comprehension measure and a cued written recall. Mean comprehension dropped sharply as coverage fell, and few readers reached adequate comprehension below 95%; only at the 100% condition did most readers cross a working comprehension threshold. Extrapolating across the four conditions, Hu and Nation argued that around 98% coverage is needed for the majority of learners to achieve unassisted comprehension, with 95% serving at best a minority of readers.
The 98% figure quickly became the reference point in L2 vocabulary research and in graded reader construction worldwide.
Refinement: assisted vs unassisted
Nation (2006) and Laufer & Ravenhorst-Kalovski (2010) later refined the framing into two thresholds rather than one. 98% coverage is positioned as the optimal threshold for independent, pleasure-oriented reading without external support. A lower 95% threshold is positioned as a minimal threshold, sufficient when readers have dictionary access, glossing, teacher support, or strong topic familiarity. The pair-of-thresholds framing is now the standard reading of the literature, with 98% as the target for extensive reading and 95% as the floor for assisted or instructional reading.
Coverage to vocabulary size
Translated through Nation's BNC-derived word lists, 98% written coverage corresponds to roughly 8,000–9,000 word families plus proper nouns. This figure anchors the upper end of vocabulary-size targets for unassisted reading of authentic adult texts and explains why lower-band graded readers must remain tightly controlled within the most frequent 1,000–4,000 word families to keep coverage near 98% for their intended learners.
Replication
Kremmel et al. (2023) replicated Hu and Nation with a larger sample and improved instrumentation in Language Learning, broadly supporting the original study's coverage-comprehension gradient while adding precision on individual variation around the threshold.
References
- Hu, M., & Nation, P. (2000). Unknown vocabulary density and reading comprehension. Reading in a Foreign Language, 13(1), 403–430.
- Kremmel, B., Brunfaut, T., & Alderson, J. C. (2023). Unknown vocabulary density and reading comprehension: Replicating Hu and Nation (2000). Language Learning, 73(S1). https://doi.org/10.1111/lang.12622
- Laufer, B., & Ravenhorst-Kalovski, G. C. (2010). Lexical threshold revisited: Lexical text coverage, learners' vocabulary size and reading comprehension. Reading in a Foreign Language, 22(1), 15–30.
- Nation, I. S. P. (2006). How large a vocabulary is needed for reading and listening? The Canadian Modern Language Review, 63(1), 59–82. https://doi.org/10.3138/cmlr.63.1.59