ELTiverse

Search Terms

Search for ELT terms and concepts

Classical Test Theory

AssessmentCTT

Classical Test Theory (CTT) is the traditional framework for understanding measurement in testing. Its central equation is deceptively simple:

X=T+EX = T + E

where X is the observed score, T is the true score (the hypothetical error-free score), and E is the error of measurement. Every observed score contains some degree of random error — CTT provides the tools to estimate how much.

CTT has dominated language testing and educational measurement since the early 20th century. While modern approaches like Item Response Theory (IRT) address some of its limitations, CTT remains the practical foundation for most classroom and institutional testing because of its computational simplicity and modest sample-size requirements.

Core Assumptions

  1. The true score is the expected value of the observed score across an infinite number of independent administrations of the same test
  2. Error scores are random — they have an expected value of zero
  3. Errors are uncorrelated with true scores — knowing someone's true ability tells you nothing about their error
  4. Errors across tests are uncorrelated — error on one test is unrelated to error on another

These assumptions are untestable in practice (we never observe true scores), but they provide a workable framework for estimating reliability and measurement error.

Key Concepts in CTT

Reliability

Reliability in CTT is defined as the proportion of observed score variance attributable to true score variance:

rXX=σT2σX2r_{XX'} = \frac{\sigma^2_T}{\sigma^2_X}

A reliability coefficient of 0.85 means 85% of the variance in observed scores reflects real differences between test takers; 15% is error. Common estimation methods include:

MethodWhat it estimates
Test-retestStability over time
Parallel formsEquivalence across test versions
Split-halfInternal consistency (single administration)
Cronbach's alphaInternal consistency — the most widely reported

Standard Error of Measurement (SEM)

The SEM estimates how much an individual's observed score might vary due to measurement error:

SEM=σX1rXXSEM = \sigma_X \sqrt{1 - r_{XX'}}

A test with SD = 10 and reliability = 0.84 has SEM = 4. A candidate scoring 65 has a true score likely falling between 61 and 69 (±1 SEM). This has direct implications for cut score decisions — candidates near the boundary may be misclassified by chance.

Item Statistics

CTT provides the two core item-level statistics used in item analysis:

Both are straightforward to calculate and interpret, which is why CTT-based item analysis is standard practice even in contexts where IRT would be theoretically preferable.

Limitations

LimitationExplanation
Sample dependenceItem statistics (difficulty, discrimination) depend on the sample tested — the same item looks different with different groups
Test dependencePerson ability estimates depend on which items were administered
Equal error assumptionCTT assumes measurement error is the same for all ability levels — in reality, tests measure more precisely in the middle of the score range
Total score focusCTT works with total scores; it cannot model the probability of a specific response to a specific item

IRT addresses all four limitations by modelling the relationship between item parameters and person ability on a common scale, independent of the particular sample or test form. However, IRT requires larger samples (typically 200+) and specialised software.

CTT vs IRT in Practice

For most language teaching contexts — classroom tests, progress tests, placement tests, institutional achievement exams — CTT is sufficient and practical. IRT is essential for large-scale, high-stakes testing programmes (IELTS, TOEFL, Cambridge exams) where test equating, adaptive testing, and item banking across forms are required.

Key References

  • Bachman, L. F. (1990). Fundamental Considerations in Language Testing. Oxford University Press.
  • Brown, J. D. (2005). Testing in Language Programs. McGraw-Hill.
  • Crocker, L. & Algina, J. (1986). Introduction to Classical and Modern Test Theory. Holt, Rinehart and Winston.

See Also

Related Terms