Standardization

AssessmentExaminer StandardizationRater TrainingBenchmarking

Standardization is the process of training and calibrating examiners/raters so they apply a rating scale consistently. Without it, the same performance can receive different scores from different raters — undermining inter-rater reliability and, by extension, test validity.

The Process

A typical standardization session follows this sequence:

Familiarization — raters study the rating scale descriptors and assessment criteria
Benchmarking — raters score sample performances that have been pre-rated by senior examiners; discrepancies are discussed
Practice rating — raters score additional samples independently, then compare and reconcile
Certification — raters must achieve acceptable agreement levels (often measured by exact or adjacent agreement rates, or correlation coefficients) before they are approved to rate live assessments

Key Concepts

Rater severity/leniency — individual raters tend to be systematically harsh or generous; standardization aims to narrow this range
Rater drift — even trained raters become less consistent over time, requiring re-standardization (IELTS examiners are re-certified regularly)
Multi-faceted Rasch measurement (Linacre 1989) — a statistical approach that models rater severity as a measurable facet alongside candidate ability and task difficulty, enabling post-hoc adjustment

IELTS as a Case Study

IELTS employs one of the most rigorous standardization systems in language testing. Examiners undergo initial certification training, are monitored through recorded assessments, and must pass re-certification. Double marking is used for Writing. This infrastructure is what allows scores from different test centers worldwide to be meaningfully compared — the scale means the same thing regardless of who rates it.

Practical Implication

Any institution using subjective assessment (speaking, writing) needs some form of standardization. Even informal moderation meetings — where teachers score the same student work and discuss differences — substantially improve scoring consistency.

Standardization

The Process

Key Concepts

IELTS as a Case Study

Practical Implication

Related Terms