Rating Scale

Assessmentrating scalesband descriptorsscoring rubricmarking criteria

A rating scale is a set of ordered descriptors used to evaluate language performance. Each level (band) defines what a test-taker can and cannot do, providing the criteria that link observed performance to numerical scores. Rating scales operationalise the construct — they make abstract ability visible and scorable.

Scale Types

Analytic scales — Separate scores on multiple criteria. IELTS Writing uses four analytic criteria (Task Achievement, Coherence & Cohesion, Lexical Resource, Grammatical Range & Accuracy), each scored 0-9.

Holistic scales — A single global score based on an overall impression. The TOEFL iBT independent writing task historically used a holistic 0-5 scale.

See Analytic vs Holistic Scoring for a detailed comparison.

Development Approaches

Intuitive — Expert judgement produces descriptors (traditional method)
Empirical — Descriptors derived from analysis of actual performances at each level (data-driven; North, 2000)
Theory-based — Descriptors derived from a model of communicative competence (Bachman & Palmer, 1996)

The CEFR scales (North & Schneider, 1998) used empirical Rasch scaling of teacher judgements — the most rigorously developed set of descriptors in language assessment.

Descriptor Quality

Good descriptors are:

Positively worded — State what the learner can do, not what they cannot
Independent — Each criterion is distinct, not overlapping
Observable — Refer to behaviours, not inferred mental states
Hierarchically ordered — Clear progression from one band to the next

Teaching Implications

Sharing rating scale descriptors with learners improves self-assessment and goal-setting
Training teachers to use scales consistently requires standardization sessions with benchmark samples
Scales shape Washback — what the descriptors emphasise, teachers will teach

Rating Scale

Scale Types

Development Approaches

Descriptor Quality

Teaching Implications

Related Terms