Cut Score
A cut score (also cut-off score or pass mark) is the point on a score scale that separates one classification from another: pass from fail, one proficiency level from the next, admission from rejection. It is one of the most consequential decisions in assessment: a single point can determine whether a candidate enters a university, receives a visa, or qualifies for a profession.
Cut scores are not discovered; they are set. There is no objectively correct boundary between Band 6 and Band 7, or between pass and fail. The goal is to set the boundary through a principled, transparent, and defensible process rather than arbitrarily.
Standard-Setting Methods
The Angoff Method (1971)
The most widely used approach. A panel of subject-matter experts examines each test item and estimates the probability that a "minimally competent" candidate would answer it correctly. The sum of these probabilities across all items becomes the recommended cut score.
Strengths: Item-level judgments, widely understood, well-researched. Weaknesses: Experts struggle to imagine a "minimally competent" candidate; estimates can vary widely between judges.
The Bookmark Method
Experts review items ordered by difficulty (from IRT calibration). Each expert places a "bookmark" at the point where a minimally competent candidate would begin to struggle. The median bookmark placement determines the cut score.
Strengths: More intuitive than Angoff: experts make holistic judgments about ordered items. Weaknesses: Requires IRT calibration; placement of the first few items heavily influences the final result.
The Borderline Method
Used for performance assessments. Raters identify candidates whose performance is "borderline", on the edge between two levels. The mean score of borderline candidates becomes the cut score.
Strengths: Grounded in actual candidate performances, not hypothetical judgments. Weaknesses: Requires raters to make a separate "borderline" judgment, which introduces its own subjectivity.
Setting Cut Scores Responsibly
| Principle | Explanation |
|---|---|
| Use multiple methods | No single method is definitive; triangulating methods builds confidence |
| Use panels, not individuals | One person's judgment is insufficient; panels of 8–15 experts are typical |
| Provide impact data | Show panellists what percentage of candidates would pass/fail at various cut points |
| Consider consequences | What is the cost of a false positive (passing someone who should fail) vs a false negative (failing someone who should pass)? |
| Document everything | The process must be transparent and reproducible |
| Review periodically | Cut scores may need adjustment as populations, curricula, or test forms change |
The Measurement Error Problem
Every observed score contains measurement error (Classical Test Theory). A candidate scoring 64 on a test with a pass mark of 65 may have a true score above or below the cut. The standard error of measurement (SEM) quantifies this uncertainty. Responsible testing programmes:
- Report confidence intervals alongside scores
- Provide review/appeal mechanisms for borderline candidates
- Avoid treating cut scores as absolute truths
Cut Scores in Language Testing
In the IELTS system, band scores function as cut scores set by receiving institutions: a university requiring Band 6.5 overall with no band below 6.0 has set a multi-component cut score. The IELTS partnership itself sets the band boundaries through descriptor-based standard setting, anchored by band descriptors and examiner standardisation.
In criterion-referenced institutional testing, the pass mark should reflect the minimum level of ability needed for the next stage, not an arbitrary percentage like 50% or 60%.
Key References
- Cizek, G. J. & Bunch, M. B. (2007). Standard Setting: A Guide to Establishing and Evaluating Performance Standards on Tests. Sage.
- Angoff, W. H. (1971). Scales, norms, and equivalent scores. In R. L. Thorndike (Ed.), Educational Measurement (2nd ed.). American Council on Education.
- Bachman, L. F. & Palmer, A. S. (1996). Language Testing in Practice. Oxford University Press.
See Also
- Band Descriptors: define what performance looks like at each level
- High-Stakes Testing: where cut score decisions carry the greatest weight
- Criterion-referenced Testing: cut scores are central to criterion-referenced interpretation
- Classical Test Theory: measurement error affects classification accuracy at cut points