Cloze Test

AssessmentCloze TestingCloze ProcedureCloze Deletion Test

A cloze test is a reading passage with words systematically deleted at regular intervals (typically every 5th, 6th, or 7th word), which the test-taker must restore. Developed by Wilson Taylor (1953) as a readability measure, it was adopted into language testing as a measure of integrative language proficiency.

The term "cloze" comes from "closure" — the Gestalt psychological principle that humans naturally fill in gaps to perceive wholes.

Standard Cloze vs Selective Deletion

Fixed-Ratio Cloze (Standard)

Words are deleted at a predetermined interval regardless of what the words are. Every 7th word is removed, whether it is a function word, a content word, or a proper noun. This is Taylor's original procedure and is what "cloze test" technically refers to.

The mechanical deletion ensures the test samples a representative cross-section of the text's linguistic features — grammar, vocabulary, discourse markers, collocations — without the test writer selecting what to test.

Selective Deletion (Modified Cloze)

Specific words are chosen for deletion based on what the test writer wants to assess — particular grammar structures, vocabulary items, or discourse features. This is often called a "modified cloze" or "gap-fill" exercise. It sacrifices the representative sampling of the standard cloze for targeted assessment of specific language areas.

Strictly speaking, a selective deletion task is not a true cloze test (Hughes 2003; Brown & Abeywickrama 2010), though the terms are often conflated in practice.

C-Test

A variant where the second half of every second word is deleted. The first sentence is left intact. Developed by Klein-Braley & Raatz (1984) as a more efficient alternative to traditional cloze. Advantages: shorter, higher reliability per minute of testing time, less controversial scoring.

Scoring Methods

Exact-word scoring. Only the original word is accepted. "She went to the ___" accepts only the specific original word. This is stricter but more reliable — no rater judgment needed.

Acceptable-word scoring. Any contextually appropriate word is accepted. This is more valid (it credits genuine comprehension) but introduces scorer variability and requires an answer key that anticipates alternatives.

Research (Oller 1979; Brown 2002) shows that both methods rank test-takers in the same order — the correlation between exact and acceptable scoring is very high (typically r > .95). Exact-word scoring is therefore generally preferred for its practicality, despite appearing harsh.

What Does a Cloze Test Measure?

This is one of the most debated questions in language testing.

The integrative argument. Oller (1979) argued that cloze tests measure a global "expectancy grammar" — the ability to predict and process language using all available linguistic and contextual cues simultaneously. This made cloze a powerful proficiency measure because it tapped multiple skills at once.

The criticism. Alderson (1979, 1980) demonstrated that most cloze items can be answered using local context only (the immediately surrounding words), not the broader discourse understanding Oller claimed. Different deletion rates produce different tests measuring different things. The construct being measured is unstable.

The current consensus. Cloze tests primarily measure lower-level reading processes — vocabulary knowledge, grammatical competence, local cohesion processing — and are less effective at measuring higher-level skills like main idea comprehension, inference, or critical evaluation (Alderson 2000; Hughes 2003). They are useful but limited.

Strengths

Easy to construct. Select a passage, delete every nth word — the test writes itself
High reliability. Multiple items from a single passage produce strong internal consistency
Integrative. Tests multiple language features simultaneously, unlike discrete-point items
Efficient screening tool. Works well for placement or quick proficiency estimation
Resistant to coaching. Difficult to prepare for with test-taking strategies

Limitations

Construct ambiguity. What exactly is being measured? The answer depends on the passage, the deletion rate, and the specific words that happen to fall at deletion points
Passage dependency. Results vary significantly depending on the text chosen — different passages produce different reliability estimates
Ceiling/floor effects. Very easy passages produce ceiling effects; very difficult ones produce floor effects. Passage selection is critical.
Not suitable for productive skills. Cloze tests assess receptive processing; they cannot measure writing or speaking ability
Cultural and topic bias. Background knowledge of the topic strongly influences performance, introducing construct-irrelevant variance

Why It Matters

The cloze test represents an important moment in the history of language testing — the shift from discrete-point testing (testing one item at a time in isolation) to integrative testing (testing multiple skills through extended text processing). Even though the cloze test's limitations are now well documented, the underlying principle — that language ability is best measured through integrated tasks, not isolated items — shaped the development of modern communicative testing.

In practical terms, cloze-type tasks remain common in language classrooms and tests. Understanding what they can and cannot measure helps teachers choose them appropriately — as useful tools for quick assessment of reading-level vocabulary and grammar processing, not as comprehensive measures of language proficiency.

Key References

Taylor, W. L. (1953). Cloze procedure: A new tool for measuring readability. Journalism Quarterly, 30, 415-433.
Oller, J. W. (1979). Language Tests at School. Longman.
Alderson, J. C. (1979). The cloze procedure and proficiency in English as a foreign language. TESOL Quarterly, 13(2), 219-227.
Alderson, J. C. (2000). Assessing Reading. Cambridge University Press.
Hughes, A. (2003). Testing for Language Teachers (2nd ed.). Cambridge University Press.
Brown, J. D. (2002). Do cloze tests work? Or, is it just an illusion? Second Language Studies, 21(1), 79-125.
Klein-Braley, C., & Raatz, U. (1984). A survey of research on the C-test. Language Testing, 1(2), 134-146.