Statistical Significance

research-methodologyp-value

Statistical significance indicates whether an observed result is unlikely to have occurred by chance, given the assumption that no true effect exists (the null hypothesis). Conventionally, a result is deemed "statistically significant" when p < .05 — meaning there is less than a 5% probability of obtaining the observed result (or one more extreme) if the null hypothesis were true.

The p-value

A p-value is not the probability that the null hypothesis is true. It is the probability of the observed data (or more extreme data) given that the null hypothesis is true. This distinction is widely misunderstood.

Common misconception	Reality
p = .03 means there's a 3% chance the result is due to chance	p = .03 means data this extreme would occur 3% of the time if H₀ were true
p < .05 means the effect is real	It means we reject H₀ at the 5% threshold; the effect could still be trivially small
p > .05 means no effect	It means we failed to detect an effect; there may be one we lacked power to find

The Critical Distinction: Statistical vs Practical Significance

A large sample can make trivially small differences "statistically significant." With 1,000 participants per group, even a difference of 0.5 on a 100-point test might yield p < .001 — but the difference is educationally meaningless.

Conversely, a meaningful effect can fail to reach significance in a small study (common in SLA research, where class sizes are 15-30). This is a power problem, not evidence of no effect.

This is precisely why Effect Size reporting is essential. Effect size tells you how big the difference is; p-values tell you only whether you can be confident it is not zero.

In SLA Research

The over-reliance on p-values has been criticised extensively:

Norris & Ortega (2000) called for mandatory Effect Size reporting alongside p-values
Plonsky (2013, 2014) documented that many SLA studies were underpowered — sample sizes too small to detect real effects
The American Psychological Association (APA, 2001) required effect sizes and confidence intervals
The American Statistical Association (Wasserstein & Lazar, 2016) issued a statement warning against p-value thresholds as sole arbiters of scientific findings

Confidence Intervals

A confidence interval provides a range of plausible values for the true effect. A 95% CI that does not include zero is equivalent to p < .05 but more informative — it shows the precision of the estimate and the range of effects compatible with the data.

Key References

Fisher (1925) — originated the p < .05 threshold
Cohen (1994) — "The earth is round (p < .05)" — landmark critique
Wasserstein & Lazar (2016) — ASA statement on p-values
Norris & Ortega (2000) — effect size reporting in SLA
Plonsky & Oswald (2014) — field-specific effect size benchmarks