Readability
Readability is the predicted ease with which a reader can understand a written text. The construct is operationalised through formulas — readability indices — that combine surface features of a text into a single score, usually mapped to a school grade or to a comprehension-difficulty band. The classic indices (Flesch Reading Ease, Flesch-Kincaid Grade Level, Gunning Fog, SMOG, Dale-Chall) all rest on two predictors: average sentence length and a measure of word difficulty (syllable count, frequency band, or word length).
Modern computational readability adds discourse-level predictors. Coh-Metrix introduces lexical co-reference, causal connectives, and narrativity into the score; the Lexile and ATOS frameworks embed readability into graded-text catalogues used at school district scale. CEFR-aligned tools such as TextEvaluator and Cambridge's English Profile align text features to the Common European Framework levels.
What readability captures and what it misses
Surface-feature formulas predict comprehension reasonably well at the sentence level. They miss almost everything above it: argument structure, anaphoric distance, world-knowledge demand, and the gap between the topic and the reader's schema. A passage with short sentences and short words can still be hard to read if it requires inference across paragraph boundaries, and a passage with long sentences can be easy if it is highly redundant.
This is why a readability score alone is an unsafe input to test design. Difficulty calibration must combine readability with task difficulty, Topic Familiarity, and the cognitive demand of the question stem.
Use in test development
Readability matters most in three places. First, in stratified passage sourcing — when an item bank needs passages calibrated to specific CEFR bands, readability indices are the cheapest first filter. Second, in fairness review — when a passage scores noticeably above the band the test claims to target, that is evidence of construct-irrelevant difficulty. Third, in materials production at scale — AI-generated passages must be readability-validated before entering the bank, or the generator's stylistic preferences silently drift the calibration.
Key References
- Flesch, R. (1948). A new readability yardstick. Journal of Applied Psychology, 32(3), 221–233.
- Kincaid, J. P., Fishburne, R. P., Rogers, R. L. & Chissom, B. S. (1975). Derivation of new readability formulas for Navy enlisted personnel. Naval Technical Training Command Research Branch Report 8-75.
- Graesser, A. C., McNamara, D. S. & Kulikowich, J. M. (2011). Coh-Metrix: Providing multilevel analyses of text characteristics. Educational Researcher, 40(5), 223–234.
- Nation, I. S. P. (2009). Teaching ESL/EFL Reading and Writing. Routledge.
See Also
- Flesch-Kincaid Grade Level: the most-used grade-mapped readability index
- Coh-Metrix: discourse-level computational readability
- Text Complexity: the broader construct readability operationalises
- CEFR: the framework readability scores are commonly aligned to