Content Validity
Content validity is the extent to which a test's items and tasks adequately sample the content domain the test claims to cover. It asks: Does this test represent the full range of knowledge, skills, and abilities that define the domain?
Unlike construct validity, which relies on statistical evidence, content validity is established primarily through expert judgment — systematic evaluation of how well test content maps to the domain specification.
The Domain Sampling Problem
No test can assess everything within a domain. A "grammar test" cannot test every structure; a "reading test" cannot use every possible text type. Content validity is about whether the sample is representative of the population of possible items and tasks.
Hughes (2003) frames it as a specification-matching exercise:
- Define the content domain (what should be assessed)
- Create a test specification that maps the domain
- Write items that sample from the specification
- Evaluate whether the sample is representative and balanced
How Content Validity Differs from Construct and Face Validity
| Aspect | Content validity | Construct Validity | Face Validity |
|---|---|---|---|
| Question | Does the test sample the domain adequately? | Does it measure the target ability? | Does it look right? |
| Method | Expert review against specifications | Statistical and theoretical analysis | Stakeholder impression |
| Judgments by | Subject matter experts, test developers | Researchers, psychometricians | Test-takers, administrators |
| Limitations | Cannot determine if test measures the right process | May miss domain sampling gaps | No technical basis |
Content validity contributes evidence toward construct validity but does not guarantee it. A test can sample content broadly (strong content validity) but still fail to measure the target ability — for instance, if all items test recognition rather than production, the content coverage is broad but the construct is narrowly operationalised.
Establishing Content Validity
Test Specifications
A clear test specification (blueprint) defines:
- Skills and knowledge areas to be tested, with proportional weighting
- Item types and their relationship to the construct
- Text types and topics to be included
- Difficulty distribution across items
Example for an EH end-of-course Reading and Listening test:
| Component | Proportion | Skills assessed |
|---|---|---|
| Listening — monologue | 25% | Gist, specific information, inference |
| Listening — dialogue | 25% | Specific information, attitude, opinion |
| Reading — long text | 30% | Main idea, detail, inference, vocabulary in context |
| Reading — short texts | 20% | Scanning, matching, specific information |
Expert Review
Independent experts evaluate items against the specification. Questions to ask (adapted from Hughes 2003):
- Does the test cover all major areas of the specification?
- Is any area over- or under-represented?
- Are the topics and text types varied enough?
- Does the difficulty range match the target learner population?
- Are there important areas of the domain that are completely missing?
Content Validity for Achievement Tests
Content validity is especially critical for achievement tests, because their purpose is to assess mastery of specific course content. An achievement test that omits key course objectives or over-represents minor topics has weak content validity — and produces invalid conclusions about learning.
The alignment chain should be:
Course objectives → Syllabus content → Test specification → Test items
Breakdowns at any point weaken content validity. Common failures:
- The teacher emphasised vocabulary but the test is grammar-heavy
- The course taught writing and speaking but the test only assesses reading and listening
- The test samples only the final unit, ignoring earlier material
- The weighting of items does not match the weighting of teaching time
Content Validity for Proficiency Tests
For proficiency tests, the content domain is defined by a theory of language ability rather than a syllabus. IELTS, for example, must represent the domain of "academic English ability" through its choice of text types, task types, topics, and scoring criteria. Content validity questions include:
- Are the text types representative of what learners will encounter at university?
- Do the writing tasks sample different aspects of academic writing?
- Is the range of accents in the listening test representative of English as used internationally?
Why It Matters
Content validity is the most accessible form of validity evidence for classroom teachers. You do not need statistics — you need a clear specification and the willingness to check your test against it.
Practical implications:
- Before writing a test: Create a specification that maps to your course objectives. Decide how many items per skill area.
- After writing a test: Check each item against the specification. Mark which objective each item assesses. Look for gaps and over-representation.
- After administering a test: If learners perform poorly, check whether the test actually covered what was taught — a content validity failure, not a learning failure.
Key References
- Hughes, A. (2003). Testing for Language Teachers (2nd ed.). Cambridge University Press.
- Bachman, L. F., & Palmer, A. S. (1996). Language Testing in Practice. Oxford University Press.
- Brown, H. D., & Abeywickrama, P. (2010). Language Assessment: Principles and Classroom Practices (2nd ed.). Pearson.
- Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational Measurement (3rd ed., pp. 13-103). Macmillan.