Standardization
Standardization is the process of training and calibrating examiners/raters so they apply a rating scale consistently. Without it, the same performance can receive different scores from different raters — undermining inter-rater reliability and, by extension, test validity.
The Process
A typical standardization session follows this sequence:
- Familiarization — raters study the rating scale descriptors and assessment criteria
- Benchmarking — raters score sample performances that have been pre-rated by senior examiners; discrepancies are discussed
- Practice rating — raters score additional samples independently, then compare and reconcile
- Certification — raters must achieve acceptable agreement levels (often measured by exact or adjacent agreement rates, or correlation coefficients) before they are approved to rate live assessments
Key Concepts
- Rater severity/leniency — individual raters tend to be systematically harsh or generous; standardization aims to narrow this range
- Rater drift — even trained raters become less consistent over time, requiring re-standardization (IELTS examiners are re-certified regularly)
- Multi-faceted Rasch measurement (Linacre 1989) — a statistical approach that models rater severity as a measurable facet alongside candidate ability and task difficulty, enabling post-hoc adjustment
IELTS as a Case Study
IELTS employs one of the most rigorous standardization systems in language testing. Examiners undergo initial certification training, are monitored through recorded assessments, and must pass re-certification. Double marking is used for Writing. This infrastructure is what allows scores from different test centers worldwide to be meaningfully compared — the scale means the same thing regardless of who rates it.
Practical Implication
Any institution using subjective assessment (speaking, writing) needs some form of standardization. Even informal moderation meetings — where teachers score the same student work and discuss differences — substantially improve scoring consistency.