Classical Test Theory

AssessmentCTT

Classical Test Theory (CTT) is the traditional framework for understanding measurement in testing. Its central equation is deceptively simple:

$X = T + E$

where X is the observed score, T is the true score (the hypothetical error-free score), and E is the error of measurement. Every observed score contains some degree of random error — CTT provides the tools to estimate how much.

CTT has dominated language testing and educational measurement since the early 20th century. While modern approaches like Item Response Theory (IRT) address some of its limitations, CTT remains the practical foundation for most classroom and institutional testing because of its computational simplicity and modest sample-size requirements.

Core Assumptions

The true score is the expected value of the observed score across an infinite number of independent administrations of the same test
Error scores are random — they have an expected value of zero
Errors are uncorrelated with true scores — knowing someone's true ability tells you nothing about their error
Errors across tests are uncorrelated — error on one test is unrelated to error on another

These assumptions are untestable in practice (we never observe true scores), but they provide a workable framework for estimating reliability and measurement error.

Key Concepts in CTT

Reliability

Reliability in CTT is defined as the proportion of observed score variance attributable to true score variance:

$r_{XX'} = \frac{\sigma^2_T}{\sigma^2_X}$

A reliability coefficient of 0.85 means 85% of the variance in observed scores reflects real differences between test takers; 15% is error. Common estimation methods include:

Method	What it estimates
Test-retest	Stability over time
Parallel forms	Equivalence across test versions
Split-half	Internal consistency (single administration)
Cronbach's alpha	Internal consistency — the most widely reported

Standard Error of Measurement (SEM)

The SEM estimates how much an individual's observed score might vary due to measurement error:

$SEM = \sigma_X \sqrt{1 - r_{XX'}}$

A test with SD = 10 and reliability = 0.84 has SEM = 4. A candidate scoring 65 has a true score likely falling between 61 and 69 (±1 SEM). This has direct implications for cut score decisions — candidates near the boundary may be misclassified by chance.

Item Statistics

CTT provides the two core item-level statistics used in item analysis:

Item Difficulty (p-value) — proportion answering correctly
Item Discrimination (D or point-biserial) — how well the item separates strong from weak candidates

Both are straightforward to calculate and interpret, which is why CTT-based item analysis is standard practice even in contexts where IRT would be theoretically preferable.

Limitations

Limitation	Explanation
Sample dependence	Item statistics (difficulty, discrimination) depend on the sample tested — the same item looks different with different groups
Test dependence	Person ability estimates depend on which items were administered
Equal error assumption	CTT assumes measurement error is the same for all ability levels — in reality, tests measure more precisely in the middle of the score range
Total score focus	CTT works with total scores; it cannot model the probability of a specific response to a specific item

IRT addresses all four limitations by modelling the relationship between item parameters and person ability on a common scale, independent of the particular sample or test form. However, IRT requires larger samples (typically 200+) and specialised software.

CTT vs IRT in Practice

For most language teaching contexts — classroom tests, progress tests, placement tests, institutional achievement exams — CTT is sufficient and practical. IRT is essential for large-scale, high-stakes testing programmes (IELTS, TOEFL, Cambridge exams) where test equating, adaptive testing, and item banking across forms are required.

Key References

Bachman, L. F. (1990). Fundamental Considerations in Language Testing. Oxford University Press.
Brown, J. D. (2005). Testing in Language Programs. McGraw-Hill.
Crocker, L. & Algina, J. (1986). Introduction to Classical and Modern Test Theory. Holt, Rinehart and Winston.