ELTiverse

Search Terms

Search for ELT terms and concepts

Learner Corpus

research-methodologySLA

A learner corpus is a systematic, electronic collection of texts produced by second or foreign language learners. Unlike native-speaker corpora, learner corpora document Interlanguage — the developing language systems of L2 users — enabling large-scale analysis of error patterns, developmental features, and the effects of L1 background, proficiency level, and task type on learner production.

Major Learner Corpora

CorpusFull nameSizeFeatures
ICLEInternational Corpus of Learner English~5.7 million wordsArgumentative essays from advanced university learners; 25+ L1 backgrounds; directed by Sylviane Granger (UCLouvain, 1990-)
EFCAMDATEF-Cambridge Open Language Database~83 million wordsWritten submissions from EF online learners; 180+ nationalities; all proficiency levels
CLCCambridge Learner Corpus~55 million wordsExam scripts from Cambridge English exams; error-tagged
LINDSEILouvain International Database of Spoken English Interlanguage~1 million wordsSpoken interviews from advanced learners; 11 L1 backgrounds
ICNALEInternational Corpus Network of Asian Learners of English~2.3 million wordsWritten and spoken data from 10 Asian countries

What Learner Corpora Enable

  • Error Analysis at scale — identifying systematic error patterns across thousands of learners, moving beyond individual case studies
  • L1 influence — comparing how learners from different L1 backgrounds produce the same target features (Language Transfer, Crosslinguistic Influence)
  • Developmental profiling — tracking which features appear at which proficiency levels
  • Overuse/underuse analysis — comparing learner and native-speaker corpora to identify what learners produce too much or too little of
  • Data-driven learning — using learner corpus findings to inform teaching materials (see Concordance Lines)

Methodology

Learner corpus research follows the principles of Corpus Linguistics but adds learner-specific metadata:

  • L1 background — essential for cross-linguistic comparison
  • Proficiency level — self-reported, test-based, or institutionally assigned
  • Task type — essay, letter, narrative, spoken interview
  • Learning context — EFL/ESL, instructed/naturalistic, immersion
  • Error annotation — some corpora (CLC) include error tagging; others leave this to researchers

Concerns

  • Proficiency assignment — how reliably are learners classified into levels?
  • Task effects — different tasks elicit different language; corpus composition affects findings
  • Representativeness — most corpora over-represent university students from certain L1 backgrounds
  • Comparison corpus — what counts as the "native speaker" baseline? The native-speaker norm itself is contested in the age of ELF
  • Access — some major corpora are restricted or require licensing

Key References

  • Granger (2002) — learner corpus research: current status and future prospects
  • Granger, Dagneaux, Meunier & Paquot (2009) — ICLE
  • Geertzen, Alexopoulou & Korhonen (2013) — EFCAMDAT
  • Gilquin & Granger (2015) — learner corpus research in SLA
  • Meunier & Granger (2008) — Phraseology in Foreign Language Learning and Teaching

Related Terms