Graded Reader Construction
How publishers build graded readers: the craft decisions behind vocabulary control, grammar grading, readability measurement, and the treatment of unknown words.
The Core Vocabulary System
Headwords vs. Lemmas vs. Word Families
These three concepts are frequently conflated but operate differently in reader design.
Token: any running word in the text. "run", "runs", "running", "ran" = 4 tokens.
Lemma: the base form + its inflections only. "run" covers runs/running/ran = 1 lemma, 4 tokens.
Word family: lemma + all derived forms. "run" covers run/runs/ran/running/runner/runners/runnable = 1 family, many forms. Nation (2016) defines six levels of family breadth, from base-only (Level 1) to all infrequent derivations (Level 6). Research and the AWL use Level 6.
Headword: publisher terminology for a counted unit. Oxford Bookworms and most British publishers use headwords defined approximately as word families (covering inflections + common derivations). So "run" at 400 headwords means a learner needs to recognise runner, running, and ran, but publishers vary in how strictly they apply this.
Key implication for graded reader authors: a 400-headword level does not mean 400 tokens. Each headword represents a family cluster. The author controls the number of headword families deployed, not the raw token count.
Headword Lists by Publisher
| Publisher | Series | Levels | Headword Range | CEFR Approx |
|---|---|---|---|---|
| Oxford | Oxford Bookworms Library | Starter–Stage 6 | 250–2,500 | A1–B2 |
| Penguin/Pearson | Penguin Readers | Starter–Level 7 | 200–3,000 | Pre-A1–C1 |
| Cambridge | Cambridge English Readers | Starter–Level 6 | 250–3,800 | A1–C1 |
| Macmillan | Macmillan Readers | Starter–Upper | 300–2,200 | A1–B2 |
| National Geographic | Footprint Reading Library | 8 levels | 800–3,000 | A2–B2 |
Note: Claridge (2012) found significant inter-series inconsistency. A Bookworms text at 1,400 word families is labelled B1/B2; a Macmillan text at the same count is labelled A2/B1. These mismatches matter when building multi-series libraries.
The NGSL-GR: A Modern Alternative
The New General Service List, Graded Reader edition (NGSL-GR 1.0), was designed specifically for graded reader production. It has 11 bands in 400-word increments (bands 1–8) then 600-word increments (bands 9–11), providing finer-grained control than traditional publisher levels. A text at NGSL-GR Band 4 has 1,600 high-frequency words available. This list is corpus-derived from a 273-million-word corpus and is increasingly used by researchers for profiling reader vocabulary.
Vocabulary Control Techniques
The 98% Coverage Threshold
Nation (2001) and Hu & Nation (2000) established the foundational finding: a reader needs to know approximately 98% of running words for independent, pleasurable comprehension. At 95% (1 unknown in 20), comprehension becomes effortful and acquisition drops sharply.
Implications:
- A 400-headword text that introduces 50 new words must recycle those words heavily so that any given page maintains 98% known coverage
- Coverage ≠ comprehension; a learner may know 98% of words but fail to comprehend due to syntactic complexity or background knowledge gaps
- The 98% figure applies to word families, not raw tokens; knowing "run" gives coverage for "runner" at most proficiency levels
Coverage thresholds for vocabulary size at 98%:
| Text type | Vocabulary needed (word families) |
|---|---|
| Graded readers (controlled) | 300–2,500 depending on level |
| General fiction | ~8,000–9,000 (Nation, 2006) |
| Newspapers | ~9,000–10,000 |
| Academic text | ~8,000 + AWL |
Introducing, Recycling, and Glossing New Words
Introduction: New headwords in well-designed graded readers appear first in a high-support context. The meaning is recoverable from surrounding text (semantic transparency), illustration, or a glossary entry. Cold introduction (using a word without contextual scaffolding) is an authoring error.
Recycling: Nation's research suggests a word needs approximately 10–15 meaningful encounters for productive acquisition. For graded readers, this means a new headword introduced at page 1 should reappear naturally 8–12 more times by the end. Waring & Takaki (2003) confirmed that single-encounter learning from graded readers is modest; repeated encounters drive retention.
Glossing: Three mechanisms:
- Marginal/interlinear gloss: brief L1 or L2 definition in the margin. Research (Hulstijn, Hollander & Greidanus, 1996) shows glosses increase noticing and short-term retention but may reduce reading flow.
- End-of-chapter glossary: lower interference with reading flow; learners must actively retrieve.
- Running footnote: used by Oxford Bookworms for cultural and proper-noun items outside the headword list.
The Claridge (2012) study noted that Cambridge English Readers deliberately exclude glossaries and support materials, positioning their texts as adult leisure reading. Oxford and Penguin include notes and glossaries. Neither approach is demonstrably superior for acquisition; the choice reflects design philosophy.
Handling Proper Nouns, Names, and Cultural References
Proper nouns present a structural problem: character names, place names, brand names, and cultural references fall outside any headword list but are unavoidable in narrative. Publisher practice varies:
- Not counted: Most publishers do not count proper nouns in the headword total. "London", "Maria", "Toyota" are treated as transparent additions.
- Glossed: Culturally opaque references (e.g., "the National Health Service" in a British story) receive footnotes or brief in-text definition.
- Adapted: In simplified versions of classics, culturally embedded references may be rewritten or modernised. Oxford Bookworms Guidelines require that cultural references be explained or replaced if they are likely to be opaque to international audiences.
- Invented names: Original graded reader authors often choose phonologically simple, internationally recognisable names (Anna, Tom, Carlos) to reduce decoding load for diverse learners.
Grammar Grading
Grammar is graded in parallel with vocabulary. Each publisher uses a grammar syllabus that defines which structures are permissible at each level. The Oxford Bookworms graded grammar syllabus (publicly available) is the most widely cited model:
Oxford Bookworms Grammar Progression
| Stage | CEFR | Headwords | Key Grammar |
|---|---|---|---|
| Starter | A1 | 250 | Present simple, past simple, basic modals (can/can't), imperatives, simple coordination |
| Stage 1 | A1–A2 | 400 | Past simple, coordination (and/but/or), subordination (before, after, when, because, so) |
| Stage 2 | A2–B1 | 700 | Present perfect, will (future), have to/must/could, comparative adjectives, simple if-clauses, past continuous, tag questions, ask/tell + infinitive |
| Stage 3 | B1 | 1,000 | Should/may, present perfect continuous, used to, past perfect, causative, relative clauses |
| Stage 4 | B1–B2 | 1,400 | Future perfect/continuous, past modals (might have, should have), more complex passives, reported speech |
| Stage 5 | B2 | 1,800 | Full range of conditionals, complex noun phrases, embedded clauses |
| Stage 6 | B2–C1 | 2,500 | Near-native grammar range; archaic forms acceptable in classics |
The key design principle is no structures above the level. An author writing a Stage 2 text cannot use the past perfect even once. This discipline is harder than vocabulary control because English grammar is recursive; sophisticated meaning often requires complex syntax.
Sentence Length Norms
Empirical studies of graded readers (Claridge, 2005; Grabowski, 2015) yield approximate sentence length norms:
| Level | CEFR | Mean sentence length (words) | Max clause depth |
|---|---|---|---|
| Starter | A1 | 7–10 | 1 (simple/coordinate) |
| Stage 1–2 | A1–A2 | 10–13 | 2 (one subordinate) |
| Stage 3–4 | B1 | 13–17 | 2–3 |
| Stage 5–6 | B2 | 16–22 | 3–4 |
| Authentic adult fiction | N/A | 18–25 | Unrestricted |
Syntactic depth (number of clause embeddings per sentence) is as important as raw sentence length. A 20-word sentence with flat coordination ("She ran and fell and cried and looked up") is more accessible than a 12-word sentence with heavy embedding ("The man she'd once trusted had gone").
Readability Measures
Formula-Based Measures and Their Limits
Flesch Reading Ease (Flesch, 1948): Based on average sentence length (ASL) and average syllables per word (ASW). Score 0–100; higher = easier. Designed for native English readers. Not calibrated to CEFR. A Flesch score of 60–70 corresponds roughly to plain English for adult native readers; EFL texts typically target higher scores (70–80+) even at B2.
Flesch-Kincaid Grade Level: Converts the above to US school grade. Grade 5 ≈ 10-year-old. Again, calibrated for L1 readers; should be used cautiously for L2 material. A graded reader at 400 headwords will often score Grade 3–5 FK, but this does not mean it is appropriate for 8-year-olds.
Lexile Framework: Used primarily in US education. A Lexile score is derived from word frequency and sentence length. Lexile 500–700 = roughly A2–B1; 1000+ = C1. Lexile is increasingly used by ELT publishers but was designed for L1 comprehension, so it systematically underestimates difficulty for L2 readers who lack background vocabulary even at "easy" Lexile levels.
CEFR-J Readability Index (CEFR-J Rater): Specifically designed for EFL contexts. Developed at Tokyo University of Foreign Studies for Japanese learners but applicable cross-linguistically. It integrates lexical frequency bands, syntactic complexity measures, and text length. More aligned with L2 reading behaviour than Flesch or Lexile.
Limitations of all formula measures: Readability formulas are proxy measures. They do not directly assess:
- Background knowledge requirements
- Discourse coherence complexity
- Cultural load
- Pragmatic inference demands
For graded reader QA, formula measures are used as a screen, not a verdict. A text that passes a formula check still needs expert editorial review.
The "Plus-One" / i+1 Principle in Reader Design
Krashen's Input Hypothesis (1982) states acquisition occurs when learners encounter input at i+1, one level above current competence. Graded readers operationalise this by:
- Ensuring 95–98% of the text is composed of known words (the "i" base)
- Embedding approximately 2–5% new vocabulary (the "+1" layer) in high-support contexts
- Structuring grammar slightly above the learner's current production level but still comprehensible
The practical implication: a good graded reader at Bookworms Stage 2 should be genuinely comfortable for someone who just completed Stage 1, not someone who aspires to Stage 2. The unknown words should feel like discoveries, not obstacles.
Publisher Design Philosophies Compared
Oxford Bookworms Library
- Strictest grammar syllabus adherence of the major series (Hill, 1997 ELT Journal review)
- Includes activities, comprehension questions, and cultural notes
- Headword counts publicly documented in detail
- Both simplified classics and original stories
- Most widely used in IELTS/EAP preparation contexts due to authentic topic range
Penguin Readers (Pearson)
- More market-oriented: heavy use of film tie-ins, celebrity biographies, contemporary culture
- Asian market editions include more exercises and glossaries
- Now CEFR-mapped but with less rigorous grammar syllabus documentation
- Broader level range (Pre-A1 starter exists); vocabulary definitions looser
Cambridge English Readers
- Original fiction only; no simplified classics
- Deliberately exclude support materials: no glossary, no activities. Texts look like "real" books
- Online support available separately
- Most rigorous treatment as adult leisure reading
- Headword range extends to 3,800 (most advanced of the major series)
- Design philosophy closest to authentic text; some argue this makes them less pedagogically scaffolded
Macmillan Readers
- Mid-range positioning between Oxford's rigour and Penguin's commercial orientation
- Strong non-fiction and factual reader strand
- Clear CEFR labelling but with acknowledged level inconsistencies vs. Oxford (Claridge, 2012)
Key Academic References
- Day, R.R. & Bamford, J. (1998). Extensive Reading in the Second Language Classroom. Cambridge University Press.
- Nation, I.S.P. (2001). Learning Vocabulary in Another Language. Cambridge University Press.
- Nation, I.S.P. (2006). How large a vocabulary is needed for reading and listening? Canadian Modern Language Review, 63(1), 59–82.
- Nation, I.S.P. (2015). Principles guiding vocabulary learning through extensive reading. Reading in a Foreign Language, 27(1), 136–145.
- Hu, M. & Nation, I.S.P. (2000). Unknown vocabulary density and reading comprehension. Reading in a Foreign Language, 13(1), 403–430.
- Waring, R. & Takaki, M. (2003). At what rate do learners learn and retain new vocabulary from reading a graded reader? Reading in a Foreign Language, 15(2), 130–163.
- Claridge, G. (2005). Simplification in graded readers: Measuring the authenticity of graded texts. Reading in a Foreign Language, 17(2), 144–158.
- Claridge, G. (2012). Graded readers: How the publishers make the grade. Reading in a Foreign Language, 24(1), 106–119.
- Krashen, S. (1982). Principles and Practice in Second Language Acquisition. Pergamon Press.
See Also
- Graded Reader: overview note with series tables
- Word Families: Nation's word family levels
- BNC COCA Headword Lists (2K 3K 4K): the frequency lists grading systems draw on
- Comprehensible Input: Krashen's i+1 in full theoretical context
- Incidental Vocabulary Learning: acquisition mechanism graded readers exploit
- Graded Reader AI Pipeline: building graded content with NLP tools and LLMs