p-value

Research Methodologyp valueprobability value

The probability of observing data at least as extreme as the data actually obtained, assuming the Null Hypothesis is true. A small p-value signals that the observed result would be unlikely under the null model, and is conventionally taken as evidence against that null.

Conventions

The threshold α = .05 became standard in applied research after Fisher's Statistical Methods for Research Workers (1925), where he proposed twice the standard error as a rough convenience for judging significance. Common cut-offs in SLA reporting are .05, .01, and .001, with results below the chosen α labelled "statistically significant". The p-value itself is not a measure of Effect Size, nor the probability that the null is true.

Misinterpretations

Frequent misreadings include treating p as the probability that H₀ is true, the probability the result was due to chance, or as a measure of the size of an effect. The 2016 American Statistical Association statement on p-values rejected each of these readings and stressed that p-values, used alone, do not measure the importance of a result or the truth of a hypothesis. The statement urged researchers to report effect sizes, confidence intervals, and study context alongside any p-value.

Use in SLA

L2 research has long relied on p-based significance testing inherited from psychology and education, but recent methodological reviews — including Plonsky and Oswald (2014) — have pushed the field toward routinely reporting effect sizes and confidence intervals, with p-values as one input among several rather than the primary criterion. Underpowered designs, common in classroom-based quasi-experimental studies, make any single p-value especially fragile and amplify the case for Replication and meta-analytic synthesis.

References

Fisher, R. A. (1925). Statistical Methods for Research Workers. Edinburgh: Oliver and Boyd.
Wasserstein, R. L., & Lazar, N. A. (2016). The ASA's statement on p-values: Context, process, and purpose. The American Statistician, 70(2), 129–133.
Plonsky, L., & Oswald, F. L. (2014). How big is "big"? Interpreting effect sizes in L2 research. Language Learning, 64(4), 878–912.

Related Terms