Effect Size

SLACohen's dHedges' geffect size

An effect size quantifies the magnitude of a difference between groups or the strength of a relationship. In SLA research, it answers: how much did the treatment help, not just whether it helped (which is what p-values do).

Cohen's d

The most common effect size in SLA intervention research. Calculated as the difference between two group means divided by the pooled standard deviation.

Conventional benchmarks (Cohen, 1988):

d = 0.2 — small
d = 0.5 — medium
d = 0.8 — large

These are rules of thumb, not absolute standards. A "small" effect in a high-stakes context (e.g., d = 0.3 for a medical treatment) may be highly meaningful.

Hedges' g

A corrected version of Cohen's d that adjusts for small sample sizes. Preferred when studies have fewer than ~20 participants per group. In practice, d and g are very similar for larger samples.

Why Effect Size Matters More Than p-Values

A study can find a "statistically significant" result (p < .05) with a trivially small effect size if the sample is large enough. Conversely, a meaningful effect can fail to reach significance in a small study. Effect size separates the size of the finding from the confidence we have in it.

In the TBLT Meta-Analysis Debate

Source	Effect Size	Interpretation
Bryfonski & McKay (2019)	d = 0.93	Large — but from methodologically flawed studies
Xuan et al. (2022) recalculation	g = 0.61	Medium — from better-screened subset
Norris & Ortega (2000)	d = 0.96	Large — but outcome measures biased toward explicit knowledge

The controversy illustrates that a large effect size is only as trustworthy as the studies producing it. Methodological flaws (practice-test congruency, missing pre-tests, design conflation) can all inflate effect sizes.