Authenticity
In language assessment, authenticity refers to the degree of correspondence between the characteristics of a test task and the features of real-world language use tasks. Defined formally by Bachman & Palmer (1996) as a test quality alongside validity, reliability, and practicality.
Bachman & Palmer's Framework
Authenticity is the match between test task characteristics and TLU domain task characteristics. The TLU domain is any specific real-world setting outside the test that requires the candidate to perform language use tasks, e.g., academic lectures, workplace emails, airport announcements.
Task characteristics compared across test and TLU domain include:
- Setting: physical conditions, participants, time constraints
- Input: format, length, language, topic, genre
- Expected response: type, length, language functions required
- Relationship between input and response: degree of interactiveness, reciprocity
Two complementary axes
Bachman & Palmer (1996) split authenticity into two complementary qualities. Situational authenticity is the surface match between test-task and TLU-task characteristics: the test looks like the real-world task. Interactional authenticity is the engagement match: the task draws on the candidate's language ability, topical knowledge, and affective resources in the way real language use does. Both are necessary; either alone leaves room for tests that pass surface review but fail to measure what they claim.
Why It Matters
A test with high authenticity:
- Engages the same language abilities used in the real world (construct relevance)
- Produces positive washback: teaching to the test means teaching real-world skills
- Has face validity: test-takers perceive the tasks as relevant and fair
A test with low authenticity may still be valid if it measures the right construct, but it risks negative washback and low stakeholder acceptance.
Practical Considerations
Perfect authenticity is impossible; tests necessarily simplify, standardise, and constrain. The goal is sufficient correspondence on the task characteristics that matter most for the construct being measured. Bachman & Palmer's framework provides a systematic checklist for evaluating and improving the match.