Pedagogic Corpus
The cumulative body of texts a learner meets across a course, treated as a corpus in its own right. Willis introduced the construct to argue that the texts a course supplies are not isolated reading or listening events but a single, shapeable database from which lexical, grammatical, and discourse patterns emerge through repeated exposure. Course design then becomes corpus engineering: choosing and ordering texts so that high-frequency items recur often enough for learners to build reliable generalisations.
The construct
For Willis, the learner's encounter with the language is mediated almost entirely by the texts the course delivers. If those texts are assembled without thought to recurrence and coverage, the learner sees high-frequency patterns sporadically and rare items disproportionately often. The pedagogic corpus reframes the question. Instead of asking whether each text is interesting or level-appropriate, the writer asks what the course as a whole exposes the learner to: which words and structures appear, how often, in what contexts, and across which genres. Willis treats this aggregate as the data the learner is implicitly studying.
The construct shifts the unit of design from the lesson to the syllabus. Recycling, coverage, and pattern emergence are properties of the corpus, not of any single chapter. A coursebook that introduces get once, in passing, on page 40 has built a pedagogic corpus that systematically under-represents one of the most frequent verbs in English. The fix is structural rather than remedial.
Distinct from corpus-informed materials
A common conflation is to treat pedagogic corpus as a synonym for corpus-informed materials design. They are different. Corpus-informed materials use a reference corpus (BNC, COCA, the Cambridge Learner Corpus) during authoring to check frequency, collocation, and naturalness of invented examples. The pedagogic corpus is the corpus the learner receives as the course unfolds. One feeds the writer's pen; the other is what the learner reads and hears. A course can be heavily corpus-informed at the writing stage and still produce a thin, skewed pedagogic corpus.
Implications for design
Three practical moves follow. First, the course writer should profile the running text against frequency lists to verify that high-frequency vocabulary recurs across chapters, not only within them. Second, recycling should be planned at the corpus level: a target item introduced in unit 3 should reappear naturally in units 5, 7, and 11, not be parked in a vocabulary box. Third, genre balance matters. A pedagogic corpus dominated by dialogues delivers limited exposure to written-register patterns; one dominated by expository articles starves learners of conversational lexis. Willis and Willis frame this as building a learner's database of language they have actually met, from which grammar and lexis are then induced.
The construct underpins task-based and lexical syllabuses, where pattern emergence depends on cumulative exposure, but it applies to any course whose design takes seriously what learners are actually reading and hearing across the term.
References
- Willis, D. (1990). The Lexical Syllabus: A New Approach to Language Teaching. Collins ELT.
- Willis, J. & Willis, D. (Eds.). (1996). Challenge and Change in Language Teaching. Heinemann.
- Willis, D. (2003). Rules, Patterns and Words: Grammar and Lexis in English Language Teaching. Cambridge University Press.
- Willis, D. (2011). The pedagogic corpus and learners as researchers. In Language Description and Language Learning. John Benjamins.