Content Validity

AssessmentContent ValidationContent RelevanceContent Coverage

Content validity is the extent to which a test's items and tasks adequately sample the content domain the test claims to cover. It asks: Does this test represent the full range of knowledge, skills, and abilities that define the domain?

Unlike construct validity, which relies on statistical evidence, content validity is established primarily through expert judgment — systematic evaluation of how well test content maps to the domain specification.

The Domain Sampling Problem

No test can assess everything within a domain. A "grammar test" cannot test every structure; a "reading test" cannot use every possible text type. Content validity is about whether the sample is representative of the population of possible items and tasks.

Hughes (2003) frames it as a specification-matching exercise:

Define the content domain (what should be assessed)
Create a test specification that maps the domain
Write items that sample from the specification
Evaluate whether the sample is representative and balanced

How Content Validity Differs from Construct and Face Validity

Aspect	Content validity	Construct Validity	Face Validity
Question	Does the test sample the domain adequately?	Does it measure the target ability?	Does it look right?
Method	Expert review against specifications	Statistical and theoretical analysis	Stakeholder impression
Judgments by	Subject matter experts, test developers	Researchers, psychometricians	Test-takers, administrators
Limitations	Cannot determine if test measures the right process	May miss domain sampling gaps	No technical basis

Content validity contributes evidence toward construct validity but does not guarantee it. A test can sample content broadly (strong content validity) but still fail to measure the target ability — for instance, if all items test recognition rather than production, the content coverage is broad but the construct is narrowly operationalised.

Establishing Content Validity

Test Specifications

A clear test specification (blueprint) defines:

Skills and knowledge areas to be tested, with proportional weighting
Item types and their relationship to the construct
Text types and topics to be included
Difficulty distribution across items

Example for an EH end-of-course Reading and Listening test:

Component	Proportion	Skills assessed
Listening — monologue	25%	Gist, specific information, inference
Listening — dialogue	25%	Specific information, attitude, opinion
Reading — long text	30%	Main idea, detail, inference, vocabulary in context
Reading — short texts	20%	Scanning, matching, specific information

Expert Review

Independent experts evaluate items against the specification. Questions to ask (adapted from Hughes 2003):

Does the test cover all major areas of the specification?
Is any area over- or under-represented?
Are the topics and text types varied enough?
Does the difficulty range match the target learner population?
Are there important areas of the domain that are completely missing?

Content Validity for Achievement Tests

Content validity is especially critical for achievement tests, because their purpose is to assess mastery of specific course content. An achievement test that omits key course objectives or over-represents minor topics has weak content validity — and produces invalid conclusions about learning.

The alignment chain should be:

Course objectives → Syllabus content → Test specification → Test items

Breakdowns at any point weaken content validity. Common failures:

The teacher emphasised vocabulary but the test is grammar-heavy
The course taught writing and speaking but the test only assesses reading and listening
The test samples only the final unit, ignoring earlier material
The weighting of items does not match the weighting of teaching time

Content Validity for Proficiency Tests

For proficiency tests, the content domain is defined by a theory of language ability rather than a syllabus. IELTS, for example, must represent the domain of "academic English ability" through its choice of text types, task types, topics, and scoring criteria. Content validity questions include:

Are the text types representative of what learners will encounter at university?
Do the writing tasks sample different aspects of academic writing?
Is the range of accents in the listening test representative of English as used internationally?

Why It Matters

Content validity is the most accessible form of validity evidence for classroom teachers. You do not need statistics — you need a clear specification and the willingness to check your test against it.

Practical implications:

Before writing a test: Create a specification that maps to your course objectives. Decide how many items per skill area.
After writing a test: Check each item against the specification. Mark which objective each item assesses. Look for gaps and over-representation.
After administering a test: If learners perform poorly, check whether the test actually covered what was taught — a content validity failure, not a learning failure.

Key References

Hughes, A. (2003). Testing for Language Teachers (2nd ed.). Cambridge University Press.
Bachman, L. F., & Palmer, A. S. (1996). Language Testing in Practice. Oxford University Press.
Brown, H. D., & Abeywickrama, P. (2010). Language Assessment: Principles and Classroom Practices (2nd ed.). Pearson.
Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational Measurement (3rd ed., pp. 13-103). Macmillan.