Lexical Profiling

Materials DevelopmentLanguage AnalysisResearch MethodologyVocabulary ProfilingFrequency ProfilingLexical Profile

Lexical profiling runs a text against one or more frequency lists and reports the proportion of tokens that fall into each band, plus the items that fall outside every band. It is the standard diagnostic for asking whether a text sits at the lexical level a target reader can handle, and the workhorse routine behind graded reader construction, coursebook draft auditing, and exam-text vetting.

A typical profile against the BNC/COCA 1K–25K family lists returns a coverage figure for each 1,000-word band, a cumulative figure (the share of tokens covered up to and including that band), and an off-list residue containing proper nouns, low-frequency items, technical terms, and anything outside the chosen list. The output answers a single question: where does this text become lexically opaque for a learner who has mastered band n?

Inputs determine outputs

A profile is only as principled as the list it runs against. The same text profiled against Nation's BNC/COCA lists, the GSL+AWL pairing, the New General Service List, or the Lancaster lemma lists will yield different coverage figures and different off-list residues. The choice of list encodes assumptions about register (spoken vs written), variety (British, American, mixed), counting unit (word family vs lemma vs flemma), and edition vintage. A profile reported without specifying its list is uninterpretable.

The unit of analysis matters as much as the list. RANGE and AntWordProfiler default to word families; the NGSL Profiler operates on lemmas; some research uses flemmas (lemmas plus inflectional variants but not derivations). Comparing profiles across studies that use different units is unsafe without conversion.

The major tools

Five tools dominate practical and research use. The RANGE program from Heatley, Nation, and Coxhead is the original desktop profiler and the methodological ancestor of the rest. Lextutor's VocabProfile (and VP-Compleat) put the same logic in a browser, with multiple list options and a side-by-side colour-coded text display. Laurence Anthony's AntWordProfiler is a free cross-platform desktop successor to RANGE with custom-list support and batch processing for large corpora. The NGSL Profiler at newgeneralservicelist.com profiles texts against the Browne, Culligan, and Phillips lists. BYU's now-archived corpus.byu.edu interface offered profiling against COCA-derived frequency tiers.

Where designers use it

Materials writers profile draft passages before submitting them, comparing the off-list residue against the coursebook's running glossary to confirm that unfamiliar items are pre-taught, glossed, or replaced. Graded-reader editors profile manuscripts to verify that 95% or 98% coverage targets sit within the stated level. Researchers use profiling to characterise the lexical demands of authentic input (news articles, academic textbook chapters, film scripts) and to compare those demands against learner vocabulary-size estimates.

References

Anthony, L. (2024). AntWordProfiler (Version 2.x) [Computer Software]. Tokyo, Japan: Waseda University. https://www.laurenceanthony.net/software/antwordprofiler/
Browne, C., Culligan, B., & Phillips, J. (2013). The New General Service List. https://www.newgeneralservicelist.com
Cobb, T. (2007). Computing the vocabulary demands of L2 reading. Language Learning & Technology, 11(3), 38–63. https://www.lltjournal.org/item/441/
Heatley, A., Nation, I. S. P., & Coxhead, A. (2002). RANGE and Frequency programs. Victoria University of Wellington. https://www.wgtn.ac.nz/lals/resources/paul-nations-resources/vocabulary-analysis-programs
Nation, I. S. P. (2013). Learning Vocabulary in Another Language (2nd ed.). Cambridge University Press.
Nation, I. S. P. (2016). Making and Using Word Lists for Language Learning and Testing. John Benjamins.

Related Terms