Between-Group Study
A between-group study (also called between-subjects design) compares two or more groups that receive different treatments. In SLA research, this typically means a treatment group (e.g., receiving TBLT instruction) compared to a control or comparison group (e.g., receiving traditional instruction or no instruction). Each participant is in only one condition.
Why It Matters
Between-group designs are the standard for causal claims about instructional effectiveness. The logic: both groups experience the same passage of time, the same maturation, and the same external factors. The only systematic difference is the treatment. Therefore, post-treatment differences can be attributed to the treatment itself.
Between-Group vs. Within-Group
| Feature | Between-group | Within-group (pre-post) |
|---|---|---|
| Groups | Two or more separate groups | Same group tested twice |
| What it measures | Treatment-specific effect | Total change (all causes) |
| Causal logic | Stronger: comparison group controls for non-treatment factors | Weaker: cannot separate treatment from maturation, practice effects, regression to the mean |
| Effect size magnitude | Generally smaller | Generally larger |
| Effect Size benchmarks (Plonsky & Oswald, 2014) | 0.40 / 0.70 / 1.00 | 0.60 / 1.00 / 1.40 |
Within-group designs cannot rule out: natural maturation, practice effects from repeated testing, regression to the mean (especially if learners were selected for low performance), or the Hawthorne effect.
Requirements for a Sound Between-Group Study
- Pre-test for both groups: establishes baseline equivalence. Without this, post-treatment differences might reflect pre-existing ability gaps, not the treatment.
- Random assignment (ideal) or demonstrated equivalence: ensures groups are comparable. True randomisation is the gold standard because it probabilistically equalises all confounding variables.
- Clear description of both groups: you need to know what each group actually did. A vague "traditional instruction" label for the control group makes interpretation impossible.
- Same outcome measure: both groups assessed with the same instrument under the same conditions.
Effect Size Calculation
Posttest-only: d = (M_treatment − M_control) / SD_pooled
With pre-test adjustment (preferred; Morris, 2008): d = [(M_treatment_post − M_treatment_pre) − (M_control_post − M_control_pre)] / SD_pooled_pre
The pre-test-adjusted formula uses the pre-test SD as the standardiser (uncontaminated by the treatment) and controls for any baseline differences. The choice of denominator can shift d by approximately 0.3, potentially moving an effect from "small" to "medium."
The Problem of Quasi-Experimental Designs
Most SLA classroom research uses intact classes rather than true random assignment. This makes them quasi-experimental rather than truly experimental. Reasons:
- Institutions assign students to classes; researchers cannot reshuffle them
- Ethical concerns about denying instruction to a control group
- Practical constraints of timetables and administration
- Only an estimated 16–25% of experimental SLA/CALL studies use true random assignment
With intact groups, systematic differences between classes (different teachers, time slots, motivation levels) become potential confounds. This makes pre-testing, transparent comparison group descriptions, and careful equivalence checks especially critical.
The Problem of Design Conflation
Pooling effect sizes from between-group and within-group studies in a single meta-analysis inflates the overall estimate. Within-group d values are inherently larger because they capture all change, not just treatment-specific effects. This was a core problem in the Bryfonski-McKay controversy: of 52 studies, only 27 were between-group designs, and even among those, many lacked pre-tests or adequate control group descriptions.
Other Threats to Validity
- Attrition: participants dropping out differentially between groups can destroy equivalence. If weaker students drop out of the treatment group, the remaining group looks artificially better.
- Practice-Test Congruency: if the treatment group practises activities similar to the post-test but the control group does not, better performance may reflect test familiarity rather than genuine learning.
- Teacher effects: if different teachers teach each group, any differences might reflect the teacher rather than the treatment.
In the TBLT Meta-Analysis Debate
| Source | Effect Size | Note |
|---|---|---|
| Bryfonski & McKay (2019) | d = 0.93 | From 27 between-group studies, but many lacked pre-tests |
| Xuan et al. (2022) | g = 0.61 | Recalculated from 16 better-screened studies |
| Boers et al. (2021) | Only 1 of 27 survived | Rigorous screening left a single usable study |
The takeaway for consumers of meta-analyses: always check whether the aggregate was calculated from between-group comparisons, within-group comparisons, or a mix of both.