Adaptive Testing

Assessment

Adaptive testing — most commonly Computerised Adaptive Testing (CAT) — is a form of assessment in which item difficulty adjusts dynamically based on the test taker's responses. A correct answer triggers a harder item; an incorrect answer triggers an easier one. The algorithm converges on a precise estimate of ability in fewer items than a traditional fixed-form test.

How It Works

The test begins with an item of medium difficulty.
After each response, the algorithm re-estimates the test taker's ability level.
The next item is selected to maximise information at that estimated level — not too easy, not too hard.
The process continues until a stopping criterion is met: a predetermined number of items, a confidence threshold for the ability estimate, or a time limit.
The final score is the ability estimate, not a raw count of correct answers.

The key insight: items that are far too easy or far too hard for a given test taker provide almost no information about their ability. Adaptive testing eliminates these uninformative items, making the test shorter and more precise.

Item Response Theory (IRT)

CAT is built on Item Response Theory, a psychometric framework that models the probability of a correct response as a function of the test taker's ability and the item's characteristics (difficulty, discrimination, and guessing parameters). IRT provides the mathematical foundation for:

Item calibration: Placing all items on a common difficulty scale
Ability estimation: Scoring test takers on a common ability scale regardless of which items they received
Item selection: Choosing the most informative item for the current ability estimate

This contrasts with Classical Test Theory, which treats all items equally and requires all test takers to receive the same items.

Advantages

Advantage	Explanation
Efficiency	Fewer items needed — CATs are typically 30–50% shorter than fixed-form equivalents
Precision	Each test taker receives items targeted at their level, yielding more accurate scores
Security	No two tests are identical, reducing cheating and item exposure
Reduced anxiety	Fewer impossibly hard or insultingly easy items — the test feels appropriately challenging
Immediate results	Computer-based delivery enables instant scoring

Applications in Language Testing

Duolingo English Test: A prominent adaptive language test used for university admissions, fully computer-based and adaptive (see Programmed Instruction).
TOEFL iBT: Uses section-level adaptivity in some forms.
Placement testing: CAT is particularly valuable for Placement Testing — it efficiently sorts learners across a wide proficiency range.
Progress testing: Adaptive formats allow repeated testing without item memorisation effects.

Limitations

Item bank requirements: CAT needs a large, well-calibrated item bank. Developing and maintaining this is expensive and time-consuming.
Item types: CAT works best with objectively scored items (multiple choice, gap-fill). Adaptive algorithms for productive skills (writing, speaking) are still developing.
Technology dependence: Requires reliable computers and internet — not available in all contexts.
Transparency: Test takers cannot review or change answers, which feels unfair to some. The algorithm's decisions are opaque.
Construct coverage: Because item selection is driven by information gain, the test may under-sample certain content areas that are less discriminating.
Test-taking strategies differ: Strategies that work for fixed-form tests (skip hard items, return later) do not apply.

Fairness and Bias

Adaptive testing can mitigate some forms of Test Bias — test takers spend less time on inappropriate items, and the adaptive algorithm treats each person individually. However, bias can still exist in the item bank itself. If items are culturally biased, the adaptive algorithm will simply deliver biased items more efficiently. IRT-based differential item functioning (DIF) analysis is used to detect and remove such items.

The Future

Multistage adaptive testing (MST), where test takers complete short fixed modules and the next module is selected adaptively, is a compromise that retains some adaptivity while allowing item review within each module. AI-driven adaptive assessment — incorporating natural language processing for scoring productive skills — is an active area of development.