Cut Score
A cut score (also cut-off score or pass mark) is the point on a score scale that separates one classification from another — pass from fail, one proficiency level from the next, admission from rejection. It is one of the most consequential decisions in assessment: a single point can determine whether a candidate enters a university, receives a visa, or qualifies for a profession.
Cut scores are not discovered — they are set. There is no objectively correct boundary between Band 6 and Band 7, or between pass and fail. The goal is to set the boundary through a principled, transparent, and defensible process rather than arbitrarily.
Standard-Setting Methods
The Angoff Method (1971)
The most widely used approach. A panel of subject-matter experts examines each test item and estimates the probability that a "minimally competent" candidate would answer it correctly. The sum of these probabilities across all items becomes the recommended cut score.
Strengths: Item-level judgments, widely understood, well-researched. Weaknesses: Experts struggle to imagine a "minimally competent" candidate; estimates can vary widely between judges.
The Bookmark Method
Experts review items ordered by difficulty (from IRT calibration). Each expert places a "bookmark" at the point where a minimally competent candidate would begin to struggle. The median bookmark placement determines the cut score.
Strengths: More intuitive than Angoff — experts make holistic judgments about ordered items. Weaknesses: Requires IRT calibration; placement of the first few items heavily influences the final result.
The Borderline Method
Used for performance assessments. Raters identify candidates whose performance is "borderline" — on the edge between two levels. The mean score of borderline candidates becomes the cut score.
Strengths: Grounded in actual candidate performances, not hypothetical judgments. Weaknesses: Requires raters to make a separate "borderline" judgment, which introduces its own subjectivity.
Setting Cut Scores Responsibly
| Principle | Explanation |
|---|---|
| Use multiple methods | No single method is definitive; triangulating methods builds confidence |
| Use panels, not individuals | One person's judgment is insufficient; panels of 8–15 experts are typical |
| Provide impact data | Show panellists what percentage of candidates would pass/fail at various cut points |
| Consider consequences | What is the cost of a false positive (passing someone who should fail) vs a false negative (failing someone who should pass)? |
| Document everything | The process must be transparent and reproducible |
| Review periodically | Cut scores may need adjustment as populations, curricula, or test forms change |
The Measurement Error Problem
Every observed score contains measurement error (Classical Test Theory). A candidate scoring 64 on a test with a pass mark of 65 may have a true score above or below the cut. The standard error of measurement (SEM) quantifies this uncertainty. Responsible testing programmes:
- Report confidence intervals alongside scores
- Provide review/appeal mechanisms for borderline candidates
- Avoid treating cut scores as absolute truths
Cut Scores in Language Testing
In the IELTS system, band scores function as cut scores set by receiving institutions: a university requiring Band 6.5 overall with no band below 6.0 has set a multi-component cut score. The IELTS partnership itself sets the band boundaries through descriptor-based standard setting, anchored by band descriptors and examiner standardisation.
In criterion-referenced institutional testing, the pass mark should reflect the minimum level of ability needed for the next stage — not an arbitrary percentage like 50% or 60%.
Key References
- Cizek, G. J. & Bunch, M. B. (2007). Standard Setting: A Guide to Establishing and Evaluating Performance Standards on Tests. Sage.
- Angoff, W. H. (1971). Scales, norms, and equivalent scores. In R. L. Thorndike (Ed.), Educational Measurement (2nd ed.). American Council on Education.
- Bachman, L. F. & Palmer, A. S. (1996). Language Testing in Practice. Oxford University Press.
See Also
- Band Descriptors — define what performance looks like at each level
- High-Stakes Testing — where cut score decisions carry the greatest weight
- Criterion-referenced Testing — cut scores are central to criterion-referenced interpretation
- Classical Test Theory — measurement error affects classification accuracy at cut points