Practice-Test Congruency
Practice-test congruency is a methodological confound where the treatment group practices activities structurally similar to the post-test, while the control group does not. When the treatment group outperforms the control, it may reflect test familiarity rather than genuine learning.
How It Works
Suppose a study compares TBLT to grammar-translation for teaching email writing:
- TBLT group: practices writing emails (communicative tasks)
- Control group: does grammar exercises and translation drills
- Post-test: write an email
The TBLT group has been practicing the test format. Their superior performance may simply show they're better at that particular assessment, not that they've learned more English. The control group might outperform them on a grammar test.
A Real Example
Kasap (2005), included in The Bryfonski-McKay [[TBLT Meta-Analysis in [[SLA|Meta-Analysis in [[SLA|Meta-Analysis]]]] Controversy|Bryfonski & McKay (2019)]]: students practiced writing emails during the TBLT treatment, then a near-identical email-writing task was used as the post-test. Boers & Faez (2023) flagged this as a borderline case — the practice-test overlap makes it impossible to separate task learning from language learning.
Why It's Insidious
Unlike missing pre-tests or vague control descriptions, practice-test congruency is easy to miss:
- It doesn't look like bad methodology on the surface — the test seems relevant to the learning objectives
- It often goes unreported because researchers don't think of it as a confound
- It inflates effect sizes in favour of whichever group's instruction more closely resembles the test
The Fix
Sound studies ensure assessment neutrality: the post-test should not structurally mirror either group's practice activities, or both groups should practice equally test-like activities. Alternatively, multiple outcome measures (some favouring each group's practice format) can reveal whether effects are genuine or artefacts of test format.
In Meta-Analyses
Boers et al. (2021) identified practice-test congruency as one of four reasons to exclude studies from their re-examination of Bryfonski & McKay (2019). It is an underappreciated confound across SLA research — not just in TBLT studies. Any comparison study where the treatment activity resembles the assessment instrument is vulnerable.
This problem also affects washback research: high-stakes tests shape instruction, which then looks like it "works" when measured by the same test format.