The reign of the p-value is over: what alternative analyses could we employ to fill the power vacuum?

The p-value has long been the figurehead of statistical analysis in biology, but its position is under threat. p is now widely recognized as providing quite limited information about our data, and as being easily misinterpreted. Many biologists are aware of p's frailties, but less clear about how they might change the way they analyse their data in response. This article highlights and summarizes four broad statistical approaches that augment or replace the p-value, and that are relatively straightforward to apply. First, you can augment your p-value with information about how confident you are in it, how likely it is that you will get a similar p-value in a replicate study, or the probability that a statistically significant finding is in fact a false positive. Second, you can enhance the information provided by frequentist statistics with a focus on effect sizes and a quantified confidence that those effect sizes are accurate. Third, you can augment or substitute p-values with the Bayes factor to inform on the relative levels of evidence for the null and alternative hypotheses; this approach is particularly appropriate for studies where you wish to keep collecting data until clear evidence for or against your hypothesis has accrued. Finally, specifically where you are using multiple variables to predict an outcome through model building, Akaike information criteria can take the place of the p-value, providing quantified information on what model is best. Hopefully, this quick-and-easy guide to some simple yet powerful statistical options will support biologists in adopting new approaches where they feel that the p-value alone is not doing their data justice.

[1]  E. Komaroff,et al.  A Proposed Hybrid Effect Size Plus p-Value Criterion: Empirical Evidence Supporting its Use , 2019, The American Statistician.

[2]  Geoff Cumming,et al.  The New Statistics for Better Science: Ask How Much, How Uncertain, and What Else Is Known , 2019, The American statistician.

[3]  S. Greenland,et al.  Scientists rise up against statistical significance , 2019, Nature.

[4]  Hyungwon Choi,et al.  Moving beyond P values: Everyday data analysis with estimation plots , 2018, bioRxiv.

[5]  David Colquhoun,et al.  The reproducibility of research and the misinterpretation of p-values , 2017, bioRxiv.

[6]  Gabriel Ruiz,et al.  Bayesian prediction intervals for assessing P-value variability in prospective replication studies , 2017, Translational Psychiatry.

[7]  Lynne U. Sneddon,et al.  Considering aspects of the 3Rs principles within experimental animal biology , 2017, Journal of Experimental Biology.

[8]  Felix D. Schönbrodt,et al.  Sequential Hypothesis Testing With Bayes Factors: Efficiently Testing Mean Differences , 2017, Psychological methods.

[9]  M. Krzywinski,et al.  Points of Significance: Interpreting P values , 2017, Nature Methods.

[10]  W. Huber A clash of cultures in discussions of the P value , 2016, Nature Methods.

[11]  N. Lazar,et al.  The ASA Statement on p-Values: Context, Process, and Purpose , 2016 .

[12]  G. Nave,et al.  Is there a Publication Bias in Behavioural Intranasal Oxytocin Research on Humans? Opening the File Drawer of One Laboratory , 2016, Journal of neuroendocrinology.

[13]  Ilana Belitskaya-Lévy,et al.  Solutions for quantifying P-value uncertainty and replication power , 2016, Nature Methods.

[14]  D. Curran‐Everett,et al.  The fickle P value generates irreproducible results , 2015, Nature Methods.

[15]  David Colquhoun,et al.  An investigation of the false discovery rate and the misinterpretation of p-values , 2014, Royal Society Open Science.

[16]  Naomi S. Altman,et al.  Points of significance: Comparing samples—part II , 2014, Nature Methods.

[17]  Nicholas J Gotelli,et al.  P values, hypothesis testing, and model selection: it's déjà vu all over again. , 2014, Ecology.

[18]  A. Spanos Recurring controversies about P values and confidence intervals revisited. , 2014, Ecology.

[19]  Michael Lavine,et al.  Comment on Murtaugh. , 2014, Ecology.

[20]  Paul A Murtaugh,et al.  In defense of P values. , 2014, Ecology.

[21]  Naomi S. Altman,et al.  Points of significance: Comparing samples—part I , 2014, Nature Methods.

[22]  Regina Nuzzo,et al.  Scientific method: Statistical errors , 2014, Nature.

[23]  G. Cumming,et al.  The New Statistics , 2014, Psychological science.

[24]  M. Lew,et al.  To P or not to P: on the evidential nature of P-values and their place in scientific inference , 2013, 1311.0081.

[25]  Aki Vehtari,et al.  Understanding predictive information criteria for Bayesian models , 2013, Statistics and Computing.

[26]  M. Lew Bad statistical practice in pharmacology (and other basic biomedical disciplines): you probably don't know P , 2012, British journal of pharmacology.

[27]  G. Loewenstein,et al.  Measuring the Prevalence of Questionable Research Practices With Incentives for Truth Telling , 2012, Psychological science.

[28]  Leonard A Stefanski,et al.  P-Value Precision and Reproducibility , 2011, The American statistician.

[29]  G B Drummond,et al.  Show the data, don't conceal them , 2011, The Journal of physiology.

[30]  David R. Anderson,et al.  AIC model selection and multimodel inference in behavioral ecology: some background, observations, and comparisons , 2011, Behavioral Ecology and Sociobiology.

[31]  David W Howells,et al.  Factors Affecting the Apparent Efficacy and Safety of Tissue Plasminogen Activator in Thrombotic Occlusion Models of Stroke: Systematic Review and Meta-Analysis , 2010, Journal of cerebral blood flow and metabolism : official journal of the International Society of Cerebral Blood Flow and Metabolism.

[32]  C. Gallistel,et al.  The Importance of Proving the Null , 2022 .

[33]  Roger Mundry,et al.  Stepwise Model Fitting and Statistical Inference: Turning Noise into Signal Pollution , 2008, The American Naturalist.

[34]  G. Cumming Replication and p Intervals: p Values Predict the Future Only Vaguely, but Confidence Intervals Do Much Better , 2008, Perspectives on psychological science : a journal of the Association for Psychological Science.

[35]  Daniel James O'Keefe,et al.  Brief Report: Post Hoc Power, Observed Power, A Priori Power, Retrospective Power, Prospective Power, Achieved Power: Sorting Out Appropriate Uses of Statistical Power Analyses , 2007 .

[36]  I. Cuthill,et al.  Effect size, confidence interval and statistical significance: a practical guide for biologists , 2007, Biological reviews of the Cambridge Philosophical Society.

[37]  D. Vaux,et al.  Error bars in experimental biology , 2007, The Journal of cell biology.

[38]  ROBERT J. STEIDL,et al.  Model Selection, Hypothesis Testing, and Risks of Condemning Analytical Tools , 2006 .

[39]  B. Becker,et al.  How meta-analysis increases statistical power. , 2003, Psychological methods.

[40]  M. Masson Using confidence intervals for graphically based data interpretation. , 2003, Canadian journal of experimental psychology = Revue canadienne de psychologie experimentale.

[41]  Hal S. Stern,et al.  On the Sensitivity of Bayes Factors to the Prior Distributions , 2002 .

[42]  S. Goodman,et al.  Of P-values and Bayes: a modest proposal. , 2001, Epidemiology.

[43]  David R. Anderson,et al.  Kullback-Leibler information as a basis for strong inference in ecological studies , 2001 .

[44]  J. Lau,et al.  State of the evidence: current status and prospects of meta-analysis in infectious diseases. , 1999, Clinical infectious diseases : an official publication of the Infectious Diseases Society of America.

[45]  Douglas H. Johnson The Insignificance of Statistical Significance Testing , 1999 .

[46]  Jacob Cohen The earth is round (p < .05) , 1994 .

[47]  Geoffrey R. Loftus,et al.  A picture is worth a thousandp values: On the irrelevance of hypothesis testing in the microcomputer age , 1993 .

[48]  J. Tukey The Philosophy of Multiple Comparisons , 1991 .

[49]  Jacob Cohen,et al.  THINGS I HAVE LEARNED (SO FAR) , 1990 .

[50]  R. Rosenthal The file drawer problem and tolerance for null results , 1979 .

[51]  M. Artés Statistical errors. , 1977, Medicina clinica.

[52]  John W. Tukey,et al.  Analyzing data: Sanctification or detective work? , 1969 .

[53]  D. Bakan,et al.  The test of significance in psychological research. , 1966, Psychological bulletin.

[54]  H. Selznick,et al.  The New Statistics , 2014, Psychological science.

[55]  J. Berkson Tests of significance considered as evidence , 1942 .

[56]  Richard D. Morey,et al.  Baysefactor: Computation of Bayes Factors for Common Designs , 2018 .

[57]  David J. Spiegelhalter,et al.  Bayesian statistics , 2009, Scholarpedia.

[58]  G. Patil,et al.  Rejoinder , 2004, Environmental and Ecological Statistics.