Rating Scale
A rating scale is a set of ordered descriptors used to evaluate language performance. Each level (band) defines what a test-taker can and cannot do, providing the criteria that link observed performance to numerical scores. Rating scales operationalise the construct — they make abstract ability visible and scorable.
Scale Types
Analytic scales — Separate scores on multiple criteria. IELTS Writing uses four analytic criteria (Task Achievement, Coherence & Cohesion, Lexical Resource, Grammatical Range & Accuracy), each scored 0-9.
Holistic scales — A single global score based on an overall impression. The TOEFL iBT independent writing task historically used a holistic 0-5 scale.
See Analytic vs Holistic Scoring for a detailed comparison.
Development Approaches
- Intuitive — Expert judgement produces descriptors (traditional method)
- Empirical — Descriptors derived from analysis of actual performances at each level (data-driven; North, 2000)
- Theory-based — Descriptors derived from a model of communicative competence (Bachman & Palmer, 1996)
The CEFR scales (North & Schneider, 1998) used empirical Rasch scaling of teacher judgements — the most rigorously developed set of descriptors in language assessment.
Descriptor Quality
Good descriptors are:
- Positively worded — State what the learner can do, not what they cannot
- Independent — Each criterion is distinct, not overlapping
- Observable — Refer to behaviours, not inferred mental states
- Hierarchically ordered — Clear progression from one band to the next
Teaching Implications
- Sharing rating scale descriptors with learners improves self-assessment and goal-setting
- Training teachers to use scales consistently requires standardization sessions with benchmark samples
- Scales shape Washback — what the descriptors emphasise, teachers will teach