Statistical Evidence in Experimental Psychology

Statistical inference in psychology has traditionally relied heavily on p-value significance testing. This approach to drawing conclusions from data, however, has been widely criticized, and two types of remedies have been advocated. The first proposal is to supplement p values with complementary measures of evidence, such as effect sizes. The second is to replace inference with Bayesian measures of evidence, such as the Bayes factor. The authors provide a practical comparison of p values, effect sizes, and default Bayes factors as measures of statistical evidence, using 855 recently published t tests in psychology. The comparison yields two main results. First, although p values and default Bayes factors almost always agree about what hypothesis is better supported by the data, the measures often disagree about the strength of this support; for 70% of the data sets for which the p value falls between .01 and .05, the default Bayes factor indicates that the evidence is only anecdotal. Second, effect sizes can provide additional evidence to p values and default Bayes factors. The authors conclude that the Bayesian approach is comparatively prudent, preventing researchers from overestimating the evidence in favor of an effect.

[1]  L. M. M.-T. Theory of Probability , 1929, Nature.

[2]  J. I The Design of Experiments , 1936, Nature.

[3]  Mercer Jennifer Ann,et al.  PUBLICATION manual of the American Psychological Association. , 1952, Psychological bulletin.

[4]  Joseph Lebacqz,et al.  SUBJECTIVE AND OBJECTIVE , 1967 .

[5]  Jacob Cohen Statistical Power Analysis for the Behavioral Sciences , 1969, The SAGE Encyclopedia of Research Design.

[6]  Donald B. Rubin,et al.  A Simple, General Purpose Display of Magnitude of Experimental Effect , 1982 .

[7]  I. Good Good Thinking: The Foundations of Probability and Its Applications , 1983 .

[8]  R. Wolpert,et al.  Likelihood Principle , 2022, The SAGE Encyclopedia of Research Design.

[9]  J. Berger,et al.  Testing Precise Hypotheses , 1987 .

[10]  J. Berger,et al.  Testing a Point Null Hypothesis: The Irreconcilability of P Values and Evidence , 1987 .

[11]  David Lindley,et al.  Bayesian Statistics, a Review , 1987 .

[12]  John W. Pratt,et al.  Testing a Point Null Hypothesis: The Irreconcilability of P Values and Evidence: Comment , 1987 .

[13]  Robert Rosenthal,et al.  How are we doing in soft psychology , 1990 .

[14]  Gideon Keren,et al.  A Handbook for data analysis in the behavioral sciences : methodological issues , 1993 .

[15]  Gerd Gigerenzer,et al.  The superego, the ego, and the id in statistical reasoning , 1993 .

[16]  Jacob Cohen The earth is round (p < .05) , 1994 .

[17]  F. Schmidt Statistical Significance Testing and Cumulative Knowledge in Psychology: Implications for Training of Researchers , 1996 .

[18]  R. Frick,et al.  The appropriate use of null hypothesis testing. , 1996 .

[19]  G. Loftus Psychology Will Be a Much Better Science When We Change the Way We Analyze Data , 1996 .

[20]  L. Wasserman,et al.  Computing Bayes Factors by Combining Simulation and Asymptotic Approximations , 1997 .

[21]  R. L. Hagen In praise of the null hypothesis statistical test. , 1997 .

[22]  W. Dunlap,et al.  On the Logic and Purpose of Significance Testing , 1997 .

[23]  Gerd Gigerenzer We need statistical thinking, not statistical rituals , 1998, Behavioral and Brain Sciences.

[24]  Howard Wainer,et al.  One cheer for null hypothesis significance testing. , 1999 .

[25]  Leland Wilkinson,et al.  Statistical Methods in Psychology Journals Guidelines and Explanations , 2005 .

[26]  S. Maxwell,et al.  The proof of the pudding: an illustration of the relative strengths of null hypothesis, meta-analysis, and Bayesian analysis. , 2000, Psychological methods.

[27]  R. Nickerson,et al.  Null hypothesis significance testing: a review of an old and continuing controversy. , 2000, Psychological methods.

[28]  Andrew Thomas,et al.  WinBUGS - A Bayesian modelling framework: Concepts, structure, and extensibility , 2000, Stat. Comput..

[29]  I. J. Myung,et al.  Toward a method of selecting among computational models of cognition. , 2002, Psychological review.

[30]  B. Thompson What Future Quantitative Social Science Research Could Look Like: Confidence Intervals for Effect Sizes , 2002 .

[31]  C. F. Bond,et al.  One Hundred Years of Social Psychology Quantitatively Described , 2003 .

[32]  Peter Dixon,et al.  The p-value fallacy and how to avoid it. , 2003, Canadian journal of experimental psychology = Revue canadienne de psychologie experimentale.

[33]  J. Charles Kerkering,et al.  Subjective and Objective Bayesian Statistics: Principles, Models, and Applications , 2003, Technometrics.

[34]  Larry Wasserman,et al.  All of Statistics: A Concise Course in Statistical Inference , 2004 .

[35]  Larry Wasserman,et al.  All of Statistics , 2004 .

[36]  M. Tribus,et al.  Probability theory: the logic of science , 2003 .

[37]  Jun Lu,et al.  An introduction to Bayesian hierarchical models with an application in the theory of signal detection , 2005, Psychonomic bulletin & review.

[38]  P. Killeen,et al.  An Alternative to Null-Hypothesis Significance Tests , 2005, Psychological science.

[39]  M. Lee,et al.  Bayesian statistical inference in psychology: comment on Trafimow (2003). , 2005, Psychological review.

[40]  J. Ioannidis Why Most Published Research Findings Are False , 2005, PLoS medicine.

[41]  W. Johnson,et al.  The Bayesian Two-Sample t Test , 2005 .

[42]  I.,et al.  Weight of Evidence : A Brief Survey , 2006 .

[43]  E. Wagenmakers,et al.  A Bayesian Perspective on Hypothesis Testing , 2006, Psychological science.

[44]  Thomas Mussweiler,et al.  Doing Is for Thinking! , 2006, Psychological science.

[45]  James G. Scott,et al.  An exploration of aspects of Bayesian multiple testing , 2006 .

[46]  P. Killeen Beyond statistical inference: a decision theory for science. , 2006, Psychonomic bulletin & review.

[47]  Peter R. Kileen Beyond statistical inference: A decision theory for science , 2006 .

[48]  E. Wagenmakers A practical solution to the pervasive problems ofp values , 2007, Psychonomic bulletin & review.

[49]  Angela Kinnell,et al.  Bayesian Analysis of Recognition Memory: The Case of the List-Length Effect , 2007 .

[50]  Z. Diénès Understanding Psychology as a Science: An Introduction to Scientific and Statistical Inference , 2008 .

[51]  G. Cumming Replication and p Intervals: p Values Predict the Future Only Vaguely, but Confidence Intervals Do Much Better , 2008, Perspectives on psychological science : a journal of the Association for Psychological Science.

[52]  M. Clyde,et al.  Mixtures of g Priors for Bayesian Variable Selection , 2008 .

[53]  Jie W Weiss,et al.  Bayesian Statistical Inference for Psychological Research , 2008 .

[54]  M. Lee Three case studies in the Bayesian analysis of cognitive models , 2008, Psychonomic bulletin & review.

[55]  Joseph Hilbe,et al.  Data Analysis Using Regression and Multilevel/Hierarchical Models , 2009 .

[56]  C. Gallistel,et al.  The Importance of Proving the Null , 2022 .

[57]  J. Raaijmakers,et al.  How to quantify support for and against the null hypothesis: A flexible WinBUGS implementation of a default Bayesian t test , 2009, Psychonomic bulletin & review.

[58]  Jeffrey N. Rouder,et al.  Bayesian t tests for accepting and rejecting the null hypothesis , 2009, Psychonomic bulletin & review.

[59]  John K Kruschke,et al.  Bayesian data analysis. , 2010, Wiley interdisciplinary reviews. Cognitive science.

[60]  E. Wagenmakers,et al.  Bayesian hypothesis testing for psychologists: A tutorial on the Savage–Dickey method , 2010, Cognitive Psychology.

[61]  Edgar Erdfelder,et al.  Experimental psychology: a note on statistical analysis. , 2010, Experimental psychology.

[62]  J. Kruschke What to believe: Bayesian methods for data analysis , 2010, Trends in Cognitive Sciences.

[63]  Ruud Wetzels,et al.  Bayesian inference using WBDev: A tutorial for social scientists , 2010, Behavior research methods.

[64]  D. Bem Feeling the future: experimental evidence for anomalous retroactive influences on cognition and affect. , 2011, Journal of personality and social psychology.

[65]  M. Lee How cognitive modeling can benefit from hierarchical Bayesian models. , 2011 .

[66]  John K Kruschke,et al.  Bayesian Assessment of Null Values Via Parameter Estimation and Model Comparison , 2011, Perspectives on psychological science : a journal of the Association for Psychological Science.

[67]  Z. Dienes Bayesian Versus Orthodox Statistics: Which Side Are You On? , 2011, Perspectives on psychological science : a journal of the Association for Psychological Science.

[68]  E. Wagenmakers,et al.  Why psychologists must change the way they analyze their data: the case of psi: comment on Bem (2011). , 2011, Journal of personality and social psychology.

[69]  Wolf Vanpaemel,et al.  Constructing informative model priors using hierarchical methods , 2011 .

[70]  Kate E Decleene,et al.  Publication Manual of the American Psychological Association , 2011 .