Statistical Significance
Statistical significance indicates whether an observed result is unlikely to have occurred by chance, given the assumption that no true effect exists (the null hypothesis). Conventionally, a result is deemed "statistically significant" when p < .05 — meaning there is less than a 5% probability of obtaining the observed result (or one more extreme) if the null hypothesis were true.
The p-value
A p-value is not the probability that the null hypothesis is true. It is the probability of the observed data (or more extreme data) given that the null hypothesis is true. This distinction is widely misunderstood.
| Common misconception | Reality |
|---|---|
| p = .03 means there's a 3% chance the result is due to chance | p = .03 means data this extreme would occur 3% of the time if H₀ were true |
| p < .05 means the effect is real | It means we reject H₀ at the 5% threshold; the effect could still be trivially small |
| p > .05 means no effect | It means we failed to detect an effect; there may be one we lacked power to find |
The Critical Distinction: Statistical vs Practical Significance
A large sample can make trivially small differences "statistically significant." With 1,000 participants per group, even a difference of 0.5 on a 100-point test might yield p < .001 — but the difference is educationally meaningless.
Conversely, a meaningful effect can fail to reach significance in a small study (common in SLA research, where class sizes are 15-30). This is a power problem, not evidence of no effect.
This is precisely why Effect Size reporting is essential. Effect size tells you how big the difference is; p-values tell you only whether you can be confident it is not zero.
In SLA Research
The over-reliance on p-values has been criticised extensively:
- Norris & Ortega (2000) called for mandatory Effect Size reporting alongside p-values
- Plonsky (2013, 2014) documented that many SLA studies were underpowered — sample sizes too small to detect real effects
- The American Psychological Association (APA, 2001) required effect sizes and confidence intervals
- The American Statistical Association (Wasserstein & Lazar, 2016) issued a statement warning against p-value thresholds as sole arbiters of scientific findings
Confidence Intervals
A confidence interval provides a range of plausible values for the true effect. A 95% CI that does not include zero is equivalent to p < .05 but more informative — it shows the precision of the estimate and the range of effects compatible with the data.
Key References
- Fisher (1925) — originated the p < .05 threshold
- Cohen (1994) — "The earth is round (p < .05)" — landmark critique
- Wasserstein & Lazar (2016) — ASA statement on p-values
- Norris & Ortega (2000) — effect size reporting in SLA
- Plonsky & Oswald (2014) — field-specific effect size benchmarks