ELTiverse

Search Terms

Search for ELT terms and concepts

Consequential Validity

Assessment

Consequential validity concerns the social consequences of test use — both intended and unintended. It asks not just "Does this test measure what it claims to measure?" but "What happens as a result of using this test? Who benefits? Who is harmed?"

The concept was introduced by Samuel Messick (1989) as part of his unified theory of validity. Messick argued that validity is not a property of the test itself but of the interpretations and uses of test scores. Consequences are an integral part of the validity argument, not an afterthought.

Messick's Validity Framework

Messick (1989) proposed a 2×2 matrix:

Evidential basisConsequential basis
Test interpretationConstruct validityValue implications
Test useRelevance/utilitySocial consequences

Traditional validity focuses on the evidential basis — does the test measure the right construct? Consequential validity extends to the right-hand column — what values does the test embody, and what effects does its use produce?

What Counts as a Consequence?

Intended consequences

  • Students who reach the required proficiency level gain access to university education
  • Teachers focus on communicative skills because the test rewards them
  • Institutions identify students who need additional support

Unintended consequences

  • A writing test that rewards formulaic structure produces learners who write to templates rather than thinking critically
  • A high-stakes test creates an industry of expensive preparation courses that disadvantage poorer candidates
  • A placement test systematically underplaces students from certain L1 backgrounds due to cultural bias in task topics
  • Test anxiety undermines the performance of otherwise capable candidates

The Debate

Consequential validity is the most contested aspect of Messick's framework. Critics (e.g., Popham, 1997; Mehrens, 1997) argue that:

  • Consequences are a matter of test use, not test validity — a valid test can be misused
  • Holding test developers responsible for all social consequences is unreasonable
  • The concept blurs the distinction between technical quality and political/ethical judgment

Defenders (including Messick himself) counter that:

  • Validity has always been about the appropriateness of inferences and actions based on scores
  • Ignoring consequences allows harmful tests to hide behind technical adequacy
  • Test developers who publish a test bear some responsibility for foreseeable effects of its use

Implications for Language Testing

ContextConsequential concern
IELTS for immigrationVisa refusal based on 0.5 band difference — is the measurement precise enough for this decision?
National exit examsDoes the test narrow the curriculum? Does preparation access vary by socioeconomic status?
Placement testingDoes the test systematically place certain groups lower, limiting their access to higher-level instruction?
WashbackDoes the test shape teaching in positive or negative ways?

In Practice

Test developers and users can address consequential validity by:

  1. Monitoring washback — Researching how the test affects teaching and learning
  2. Analysing differential performance — Checking whether the test functions differently across groups (gender, L1, SES)
  3. Reviewing score use — Ensuring scores are used for purposes the test was designed for
  4. Providing guidance — Publishing clear guidelines on appropriate score interpretation and use
  5. Revisiting decisions — Reviewing high-stakes decisions regularly to check for unintended patterns

Key References

  • Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational Measurement (3rd ed., pp. 13–103). American Council on Education/Macmillan.
  • Messick, S. (1996). Validity and washback in language testing. Language Testing, 13(3), 241–256.
  • McNamara, T. & Roever, C. (2006). Language Testing: The Social Dimension. Blackwell.

See Also

Related Terms