Is the call to abandon p-values the red herring of the replicability crisis?

In a recent article, Cumming (2014) called for two major changes to how psychologists conduct research. The first suggested change—encouraging transparency and replication—is clearly worthwhile, but we question the wisdom of the second suggested change: abandoning p-values in favor of reporting confidence intervals (CIs) only in all psychological research reports. This article has three goals. First, we correct the false impression created by Cumming that the debate about the usefulness of NHST has been won by its critics. Second, we take issue with the implied connection between the use of NHST and the current crisis of replicability in psychology. Third, while we agree with other critics of Cumming (2014) that hypothesis testing is an important part of science (Morey et al., 2014), we express skepticism that alternative hypothesis testing frameworks, such as Bayes factors, are a solution to the replicability crisis. Poor methodological practices can compromise the validity of Bayesian and classic statistical analyses alike. When it comes to choosing between competing statistical approaches, we highlight the value of applying the same standards of evidence that psychologists demand in choosing between competing substantive hypotheses.

[1]  D. Krantz The Null Hypothesis Testing Controversy in Psychology , 1999 .

[2]  Brian A. Nosek,et al.  Power failure: why small sample size undermines the reliability of neuroscience , 2013, Nature Reviews Neuroscience.

[3]  G. Loewenstein,et al.  Measuring the Prevalence of Questionable Research Practices With Incentives for Truth Telling , 2012, Psychological science.

[4]  Jeffrey Bowers,et al.  Article Commentary: On the Persistence of Low Power in Psychological Science , 2014, Quarterly journal of experimental psychology.

[5]  D. Bem Feeling the future: experimental evidence for anomalous retroactive influences on cognition and affect. , 2011, Journal of personality and social psychology.

[6]  F. Schmidt Statistical Significance Testing and Cumulative Knowledge in Psychology: Implications for Training of Researchers , 1996 .

[7]  Jeffrey N. Rouder,et al.  Why Hypothesis Tests Are Essential for Psychological Science , 2014, Psychological science.

[8]  L. Harlow,et al.  What if there were no significance tests , 1997 .

[9]  J. Krueger,et al.  Null hypothesis significance testing. On the survival of a flawed method. , 2001, The American psychologist.

[10]  G. Newman,et al.  CONFIDENCE INTERVALS , 1987, The Lancet.

[11]  E. Wagenmakers,et al.  Why psychologists must change the way they analyze their data: the case of psi: comment on Bem (2011). , 2011, Journal of personality and social psychology.

[12]  Jeffrey N. Rouder,et al.  Bayesian t tests for accepting and rejecting the null hypothesis , 2009, Psychonomic bulletin & review.

[13]  Jeffrey N. Rouder,et al.  Robust misinterpretation of confidence intervals , 2013, Psychonomic bulletin & review.

[14]  R. Frick,et al.  The appropriate use of null hypothesis testing. , 1996 .

[15]  Uri Simonsohn,et al.  Posterior-Hacking: Selective Reporting Invalidates Bayesian Results Also , 2014 .

[16]  Rick P. Thomas,et al.  When decision heuristics and science collide , 2013, Psychonomic Bulletin & Review.

[17]  Jacob Cohen The earth is round (p < .05) , 1994 .

[18]  W. Dunlap,et al.  On the Logic and Purpose of Significance Testing , 1997 .

[19]  The "difference of means" may not be the "effect size". , 1995 .

[20]  Lisa L. Harlow,et al.  Eight Common but False Objections to the Discontinuation of Significance Testing in the Analysis of Research Data , 2016 .

[21]  Thomas T. Hills,et al.  The frequentist implications of optional stopping on Bayesian hypothesis tests , 2013, Psychonomic Bulletin & Review.

[22]  Brian A. Nosek,et al.  Power failure: why small sample size undermines the reliability of neuroscience , 2013, Nature Reviews Neuroscience.

[23]  Z. Dienes Bayesian Versus Orthodox Statistics: Which Side Are You On? , 2011, Perspectives on psychological science : a journal of the Association for Psychological Science.

[24]  R. L. Hagen In praise of the null hypothesis statistical test. , 1997 .

[25]  J. Neyman,et al.  On the Problem of Confidence Intervals , 1935 .

[26]  G. Cumming,et al.  The New Statistics , 2014, Psychological science.

[27]  W. Johnson,et al.  Must psychologists change the way they analyze their data? , 2011, Journal of personality and social psychology.

[28]  G. Cumming,et al.  Confidence Intervals Permit, but Do Not Guarantee, Better Inference than Statistical Significance Testing , 2010, Front. Psychology.

[29]  Leif D. Nelson,et al.  False-Positive Psychology , 2011, Psychological science.

[30]  G. Loftus,et al.  Why Figures with Error Bars Should Replace p Values Some Conceptual Arguments and Empirical Demonstrations , 2015 .

[31]  Brian A. Nosek,et al.  Scientific Utopia , 2012, Perspectives on psychological science : a journal of the Association for Psychological Science.

[32]  G. Cumming,et al.  Researchers misunderstand confidence intervals and standard error bars. , 2005, Psychological methods.

[33]  E. Wagenmakers A practical solution to the pervasive problems ofp values , 2007, Psychonomic bulletin & review.

[34]  R. Nickerson,et al.  Null hypothesis significance testing: a review of an old and continuing controversy. , 2000, Psychological methods.

[35]  M. Lee,et al.  Statistical Evidence in Experimental Psychology , 2011, Perspectives on psychological science : a journal of the Association for Psychological Science.

[36]  P. Meehl Why Summaries of Research on Psychological Theories are Often Uninterpretable , 1990 .

[37]  Z. Dienes How Bayes factors change scientific practice , 2016 .

[38]  Richard A. Harshman,et al.  There Is a Time and a Place for Significance Testing , 2016 .

[39]  J. Rouder Optional stopping: No problem for Bayesians , 2014, Psychonomic bulletin & review.

[40]  What is the probability that null hypothesis testing is meaningless , 1995 .

[41]  H. Pashler,et al.  Editors’ Introduction to the Special Section on Replicability in Psychological Science , 2012, Perspectives on psychological science : a journal of the Association for Psychological Science.

[42]  Rink Hoekstra,et al.  Confidence Intervals Make a Difference , 2012 .

[43]  John K Kruschke,et al.  Bayesian Assessment of Null Values Via Parameter Estimation and Model Comparison , 2011, Perspectives on psychological science : a journal of the Association for Psychological Science.

[44]  B. Lecoutre,et al.  Interpretation of significance levels by psychological researchers: The .05 cliff effect may be overstated , 2001, Psychonomic bulletin & review.

[45]  J. Ioannidis Why Most Published Research Findings Are False , 2005, PLoS medicine.