Detecting Evidential Value and P-Hacking With the P-curve tool: A Word of Caution

Simonsohn, Nelson, and Simmons (2014a) proposed p-curve – the distribution of statistically significant p-values for a set of studies – as a tool to assess the evidential value of these studies. They argued that, whereas right-skewed p-curves indicate true underlying effects, left-skewed p-curves indicate selective reporting of significant results when there is no true effect (“p-hacking”). We first review previous research showing that, in contrast to the first claim, null effects may produce right-skewed p-curves under some conditions. We then question the second claim by showing that not only selective reporting but also selective nonreporting of significant results due to a significant outcome of a more popular alternative test of the same hypothesis may produce left-skewed p-curves, even if all studies reflect true effects. Hence, just as right-skewed p-curves do not necessarily imply evidential value, left-skewed p-curves do not necessarily imply p-hacking and absence of true effects in the studies involved.

[1]  Sven Kepes,et al.  Publication Bias:A call for improved meta-analytic practice in the organizational sciences , 2012 .

[2]  J. Ioannidis Why Most Discovered True Associations Are Inflated , 2008, Epidemiology.

[3]  Felix D. Schönbrodt,et al.  Correcting for Bias in Psychology: A Comparison of Meta-Analytic Methods , 2019, Advances in Methods and Practices in Psychological Science.

[4]  E. Erdfelder,et al.  Statistical power analyses using G*Power 3.1: Tests for correlation and regression analyses , 2009, Behavior research methods.

[5]  M. Borenstein,et al.  Publication Bias in Meta-Analysis: Prevention, Assessment and Adjustments , 2006 .

[6]  Brian A. Nosek,et al.  Power failure: why small sample size undermines the reliability of neuroscience , 2013, Nature Reviews Neuroscience.

[7]  Daniel Lakens What p-hacking really looks like , 2014 .

[8]  Brian A. Nosek,et al.  Registered Reports A Method to Increase the Credibility of Published Results , 2014 .

[9]  Rolf Ulrich,et al.  Effect Size Estimation From t-Statistics in the Presence of Publication Bias: A Brief Review of Existing Approaches With Some Extensions , 2018 .

[10]  Kate Zernike The Bitter Truth , 2006 .

[11]  H. Pashler,et al.  Is the Replicability Crisis Overblown? Three Arguments Examined , 2012, Perspectives on psychological science : a journal of the Association for Psychological Science.

[12]  Jelte M. Wicherts,et al.  Conducting Meta-Analyses Based on p Values , 2016, Perspectives on psychological science : a journal of the Association for Psychological Science.

[13]  Michèle B. Nuijten,et al.  Distributions of p-values smaller than .05 in psychology: what is going on? , 2016, PeerJ.

[14]  Miguel A. Vadillo,et al.  The Bitter Truth About Sugar and Willpower , 2016, Psychological science.

[15]  Christina Gloeckner,et al.  Modern Applied Statistics With S , 2003 .

[16]  J Mayer The bitter truth about sugar. , 1977, Pennsylvania dental journal.

[17]  Daniël Lakens,et al.  What p-hacking really looks like , 2014 .

[18]  H. Pashler,et al.  Puzzlingly High Correlations in fMRI Studies of Emotion, Personality, and Social Cognition 1 , 2009, Perspectives on psychological science : a journal of the Association for Psychological Science.

[19]  Anton Kühberger,et al.  Publication Bias in Psychology: A Diagnosis Based on the Correlation between Effect Size and Sample Size , 2014, PloS one.

[20]  C. Ferguson,et al.  A Vast Graveyard of Undead Theories , 2012, Perspectives on psychological science : a journal of the Association for Psychological Science.

[21]  J. Simmons,et al.  Power Posing: P-Curving the Evidence , 2016, Psychological science.

[22]  K. Fiedler Voodoo Correlations Are Everywhere—Not Only in Neuroscience , 2011, Perspectives on psychological science : a journal of the Association for Psychological Science.

[23]  Frank Renkewitz,et al.  Belastbare und effiziente Wissenschaft: Strategische Ausrichtung von Forschungsprozessen als Weg aus der Replikationskrise , 2018 .

[24]  Leif D. Nelson,et al.  p-Curve and Effect Size , 2014, Perspectives on psychological science : a journal of the Association for Psychological Science.

[25]  A. Kühberger,et al.  A Meta-Analytic Re-Appraisal of the Framing Effect , 2018 .

[26]  Rolf Ulrich,et al.  Some properties of p-curves, with an application to gradual publication bias. , 2018, Psychological methods.

[27]  M. Coltheart,et al.  The quarterly journal of experimental psychology , 1985 .

[28]  John P. A. Ioannidis,et al.  p-Curve and p-Hacking in Observational Research , 2016, PloS one.

[29]  Markus Maier,et al.  Forschungstransparenz als hohes wissenschaftliches Gut stärken , 2018 .

[30]  N. Lazar,et al.  The ASA Statement on p-Values: Context, Process, and Purpose , 2016 .

[31]  T. Sterling Publication Decisions and their Possible Effects on Inferences Drawn from Tests of Significance—or Vice Versa , 1959 .

[32]  G. Francis The frequency of excess success for articles in Psychological Science , 2014, Psychonomic bulletin & review.

[33]  Amy J. C. Cuddy,et al.  Review and Summary of Research on the Embodied Effects of Expansive (vs. Contractive) Nonverbal Displays , 2015, Psychological science.

[34]  Rolf Ulrich,et al.  p-hacking by post hoc selection with multiple opportunities: Detectability by skewness test?: Comment on Simonsohn, Nelson, and Simmons (2014). , 2015, Journal of experimental psychology. General.

[35]  G. Francis Publication bias and the failure of replication in experimental psychology , 2012, Psychonomic bulletin & review.

[36]  E. Erdfelder,et al.  Zur Methodologie von Replikationsstudien , 2018 .

[37]  R T O'Neill,et al.  The behavior of the P-value when the alternative hypothesis is true. , 1997, Biometrics.

[38]  R. Rosenthal The file drawer problem and tolerance for null results , 1979 .

[39]  Edgar Erdfelder,et al.  Experimental psychology: a note on statistical analysis. , 2010, Experimental psychology.

[40]  Leif D. Nelson,et al.  P-Curve: A Key to the File Drawer , 2013, Journal of experimental psychology. General.

[41]  Leif D. Nelson,et al.  A 21 Word Solution , 2012 .

[42]  Arndt Bröder,et al.  Result-Blind Peer Reviews and Editorial Decisions A Missing Pillar of Scientific Culture , 2013 .

[43]  Leif D. Nelson,et al.  Better P-curves: Making P-curve analysis more robust to errors, fraud, and ambitious P-hacking, a Reply to Ulrich and Miller (2015). , 2015, Journal of experimental psychology. General.

[44]  Rolf Ulrich,et al.  Inflation von falsch-positiven Befunden in der psychologischen Forschung : mögliche Ursachen und Gegenmaßnahmen , 2016 .

[45]  Daniel L. Hall,et al.  Integrity of Literature on Expressed Emotion and Relapse in Patients with Schizophrenia Verified by a p-Curve Analysis. , 2017, Family process.

[46]  Ulf Böckenholt,et al.  Adjusting for Publication Bias in Meta-Analysis , 2016, Perspectives on psychological science : a journal of the Association for Psychological Science.

[47]  Patrick Dattalo,et al.  Statistical Power Analysis , 2008 .

[48]  Amy J. C. Cuddy,et al.  P-Curving a More Comprehensive Body of Research on Postural Feedback Reveals Clear Evidential Value for Power-Posing Effects: Reply to Simmons and Simonsohn (2017) , 2017, Psychological science.

[49]  U. Schimmack The ironic effect of significant results on the credibility of multiple-study articles. , 2012, Psychological methods.

[50]  William N. Venables,et al.  Modern Applied Statistics with S , 2010 .

[51]  M. Borenstein Effect size estimation. , 2012 .

[52]  Dorothy V M Bishop,et al.  Problems in using p-curve analysis and text-mining to detect rate of p-hacking and evidential value , 2016, PeerJ.

[53]  J. Brooks Why most published research findings are false: Ioannidis JP, Department of Hygiene and Epidemiology, University of Ioannina School of Medicine, Ioannina, Greece , 2008 .

[54]  Leif D. Nelson,et al.  False-Positive Psychology , 2011, Psychological science.

[55]  Emanuel Schmider,et al.  Is It Really Robust , 2010 .