P-Values: Misunderstood and Misused

P-values are widely used in both the social and natural sciences to quantify the statistical significance of observed results. The recent surge of big data research has made the p-value an even more popular tool to test the significance of a study. However, substantial literature has been produced critiquing how p-values are used and understood. In this paper we review this recent critical literature, much of which is routed in the life sciences, and consider its implications for social scientific research. We provide a coherent picture of what the main criticisms are, and draw together and disambiguate common themes. In particular, we explain how the False Discovery Rate is calculated, and how this differs from a p-value. We also make explicit the Bayesian nature of many recent criticisms, a dimension that is often underplayed or ignored. We conclude by identifying practical steps to help remediate some of the concerns identified. We recommend that (i) far lower significance levels are used, such as 0.01 or 0.001, and (ii) p-values are interpreted contextually, and situated within both the findings of the individual study and the broader field of inquiry (through, for example, meta-analyses).

[1]  M. Lew,et al.  To P or not to P: on the evidential nature of P-values and their place in scientific inference , 2013, 1311.0081.

[2]  Jacob Cohen The earth is round (p < .05) , 1994 .

[3]  V. Garovic,et al.  Beyond Bar and Line Graphs: Time for a New Data Presentation Paradigm , 2015, PLoS biology.

[4]  Celia M. Lombardi,et al.  Final Collapse of the Neyman-Pearson Decision Theoretic Framework and Rise of the neoFisherian , 2009 .

[5]  J. Berger Could Fisher, Jeffreys and Neyman Have Agreed on Testing? , 2003 .

[6]  Mark Peplow,et al.  Social sciences suffer from severe publication bias , 2014, Nature.

[7]  R. Porcher,et al.  P Value and the Theory of Hypothesis Testing: An Explanation for New Researchers , 2010, Clinical orthopaedics and related research.

[8]  R. Lanfear,et al.  The Extent and Consequences of P-Hacking in Science , 2015, PLoS biology.

[9]  Gail M. Sullivan,et al.  Using Effect Size-or Why the P Value Is Not Enough. , 2012, Journal of graduate medical education.

[10]  Dimitra Dodou,et al.  A surge of p-values between 0.041 and 0.049 in recent decades (but negative results are increasing rapidly too) , 2015, PeerJ.

[11]  D Stephen Lindsay,et al.  Replication in Psychological Science , 2015, Psychological science.

[12]  Uri Simonsohn,et al.  Posterior-Hacking: Selective Reporting Invalidates Bayesian Results Also , 2014 .

[13]  Adnan Masood,et al.  The Theory That Would Not Die : How Bayes ' Rule Cracked the Enigma Code , Hunted Down Russian , 2013 .

[14]  Robert Coe,et al.  What Is Effect Size, and Why Is It Important? , 2015, 100 Questions (and Answers) About Statistics.

[15]  A. Gelman Objections to Bayesian statistics , 2008 .

[16]  J. Ioannidis Why Most Published Research Findings Are False , 2005, PLoS medicine.

[17]  Regina Nuzzo,et al.  Scientific method: Statistical errors , 2014, Nature.

[18]  D. Mccloskey,et al.  The Cult of Statistical Significance: How the Standard Error Costs Us Jobs, Justice, and Lives , 2008 .

[19]  R. Fisher Statistical methods for research workers , 1927, Protoplasma.

[20]  Neil Malhotra,et al.  Publication bias in the social sciences: Unlocking the file drawer , 2014, Science.

[21]  R. Coe,et al.  It's the Effect Size, Stupid What effect size is and why it is important , 2012 .

[22]  Jonathan A C Sterne,et al.  Sifting the evidence—what's wrong with significance tests? , 2001, BMJ : British Medical Journal.

[23]  I. Cockburn,et al.  The Economics of Reproducibility in Preclinical Research , 2015, PLoS biology.

[24]  Stanley Shostak,et al.  The Theory That Would Not Die: How Bayes’ Rule Cracked the Enigma Code, Hunted Down Russian Submarines, and Emerged Triumphant from Two Centuries of Controversy , 2013 .

[25]  Jeffrey T Leek,et al.  An estimate of the science-wise false discovery rate and application to the top medical literature. , 2014, Biostatistics.

[26]  David Colquhoun,et al.  An investigation of the false discovery rate and the misinterpretation of p-values , 2014, Royal Society Open Science.

[27]  Victor Thiessen Stephen T. Ziliak and Deirdre N. McCloskey, The Cult of Statistical Significance: How the Standard Error Costs us Jobs, Justice, and Lives. , 2009 .