ELTiverse

Search Terms

Search for ELT terms and concepts

Keyword Analysis

Language Analysisresearch-methodology

Keyword analysis is a Corpus Linguistics technique that identifies words occurring with unusual frequency in a target corpus compared to a reference corpus. These statistically significant items — "keywords" — reveal what a text or collection of texts is distinctively about, beyond what general English frequency would predict.

How It Works

  1. Compile a target corpus — The texts under investigation (e.g., a set of medical research articles)
  2. Select a reference corpus — A larger, general corpus representing "normal" English (e.g., the BNC or COCA)
  3. Compare frequencies — For each word, compare its frequency in the target corpus against its expected frequency based on the reference corpus
  4. Apply a statistical test — Words with statistically significant over- or under-representation are flagged as keywords

The result is a ranked list of items that characterise the target corpus — its aboutness and its distinctive linguistic features.

Statistical Measures

MeasureDescription
Log-likelihood (Dunning 1993)The most widely used keyness statistic; tests significance of frequency differences; good with low-frequency items
Chi-squaredEarlier measure; less reliable with sparse data
Odds ratioMeasures effect size rather than significance; how much more likely a word is in the target
Kullback-Leibler DivergenceInformation-theoretic approach; treats keyness as effect size (Gries 2021)

Mike Scott's WordSmith Tools (1996 onwards) popularised keyword analysis and remains a standard tool. Scott defined a keyword simply as a word occurring with "unusual frequency" in comparison to a reference corpus.

Positive and Negative Keywords

  • Positive keywords — Words significantly more frequent in the target corpus (e.g., patient, diagnosis in medical texts)
  • Negative keywords — Words significantly less frequent than expected (e.g., absence of informal vocabulary in legal texts)

Both types are informative: negative keywords reveal what a genre avoids as much as what it foregrounds.

Applications in ELT

ApplicationExample
Genre analysisIdentifying characteristic vocabulary of academic, journalistic, or legal Genres
ESP/EAP materialsExtracting domain-specific vocabulary for specialised courses
Textbook evaluationChecking whether a coursebook's vocabulary matches target Register
Learner corpus researchComparing learner output against native speaker corpora to identify overuse/underuse
Test developmentEnsuring reading passages contain vocabulary appropriate to the target level

Limitations

Keyword analysis identifies what is distinctive but not why. Interpretation requires qualitative analysis of concordance lines. Additionally, results are heavily influenced by the choice of reference corpus — different references produce different keyword lists.

Related Terms