Speech Rate in Listening Materials
Speech rate is the speed of the recorded voice in a listening text, conventionally measured in words per minute (wpm) or syllables per second. It is one of the most-controlled editorial variables in coursebook audio production and one of the most studied variables in listening-comprehension research.
Native English benchmarks
Tauroza and Allison (1990), the standard reference for speech-rate benchmarks in ELT, sampled British English across four genres (radio monologue, conversation, interviews, and academic lectures) and reported a mean speech rate of approximately 170 wpm overall. Their genre-specific bands form the figures still cited in materials-development guidelines: radio monologue 160–190 wpm, interviews 160–190 wpm, conversation 210–230 wpm in syllables-per-second terms when accounting for filler and short-turn density, and academic lectures around 140 wpm. The often-quoted shorthand of "natural English at 150–190 wpm" derives from this study.
Speaking rate vs articulation rate
Two related measures need to be kept distinct. Speaking rate counts all words across total elapsed time including pauses, hesitations, and inter-turn gaps. Articulation rate counts words across phonation time only, excluding silent pauses. The two diverge sharply in spontaneous conversation: speakers may articulate at near-monologue speed during phonation but pause frequently, producing a much lower speaking rate. Most coursebook editorial guidelines specify speaking rate; perceptual difficulty for listeners, however, tracks articulation rate more closely because pauses give working memory recovery time.
Coursebook practice
ELT publishers' editorial style guides specify speech-rate bands by level, typically slowing audio at A1–A2 to around 110–130 wpm, easing toward 130–150 wpm at B1, and approaching native rates of 150–170 wpm at B2 and above. The slowing is achieved through voice-actor delivery rather than through digital time-stretching, which would distort phonological cues. The practice is consistent across major publishers but largely undocumented in the public materials-development literature.
The deceleration debate
Griffiths (1992) tested three rates with Japanese lower-intermediate EFL learners (128, 188, and 250 wpm) and found that comprehension scores at the slow rate were significantly higher than at average and fast rates, with no significant difference between average and fast. The result was read at the time as an empirical warrant for slowing audio at lower levels. Subsequent work has complicated the picture. Slowing helps comprehension on the immediate task but does not develop the perceptual capacity to handle natural rates; learners trained on slowed audio show poor transfer to native-rate input. The materials-development position now widely endorsed, articulated by Buck (2001) and reinforced in later literature, is that decelerating audio for comprehension on a single task is a short-term scaffold, not a syllabus strategy. From B1 onward, exposure to native-rate input, supported by task design, repeated listening, and post-task transcript work, develops the listening capacity that slowed audio cannot.
References
- Buck, G. (2001). Assessing Listening. Cambridge University Press.
- Griffiths, R. (1992). Speech rate and listening comprehension: Further evidence of the relationship. TESOL Quarterly, 26(2), 385–390. https://doi.org/10.2307/3587015
- Tauroza, S., & Allison, D. (1990). Speech rates in British English. Applied Linguistics, 11(1), 90–105. https://doi.org/10.1093/applin/11.1.90