p-value
The probability of observing data at least as extreme as the data actually obtained, assuming the Null Hypothesis is true. A small p-value signals that the observed result would be unlikely under the null model, and is conventionally taken as evidence against that null.
Conventions
The threshold α = .05 became standard in applied research after Fisher's Statistical Methods for Research Workers (1925), where he proposed twice the standard error as a rough convenience for judging significance. Common cut-offs in SLA reporting are .05, .01, and .001, with results below the chosen α labelled "statistically significant". The p-value itself is not a measure of Effect Size, nor the probability that the null is true.
Misinterpretations
Frequent misreadings include treating p as the probability that H₀ is true, the probability the result was due to chance, or as a measure of the size of an effect. The 2016 American Statistical Association statement on p-values rejected each of these readings and stressed that p-values, used alone, do not measure the importance of a result or the truth of a hypothesis. The statement urged researchers to report effect sizes, confidence intervals, and study context alongside any p-value.
Use in SLA
L2 research has long relied on p-based significance testing inherited from psychology and education, but recent methodological reviews — including Plonsky and Oswald (2014) — have pushed the field toward routinely reporting effect sizes and confidence intervals, with p-values as one input among several rather than the primary criterion. Underpowered designs, common in classroom-based quasi-experimental studies, make any single p-value especially fragile and amplify the case for Replication and meta-analytic synthesis.
References
- Fisher, R. A. (1925). Statistical Methods for Research Workers. Edinburgh: Oliver and Boyd.
- Wasserstein, R. L., & Lazar, N. A. (2016). The ASA's statement on p-values: Context, process, and purpose. The American Statistician, 70(2), 129–133.
- Plonsky, L., & Oswald, F. L. (2014). How big is "big"? Interpreting effect sizes in L2 research. Language Learning, 64(4), 878–912.