A Bayes Factor for Replications of ANOVA Results

ABSTRACT With an increasing number of replication studies performed in psychological science, the question of how to evaluate the outcome of a replication attempt deserves careful consideration. Bayesian approaches allow to incorporate uncertainty and prior information into the analysis of the replication attempt by their design. The Replication Bayes factor, introduced by Verhagen and Wagenmakers (2014), provides quantitative, relative evidence in favor or against a successful replication. In previous work by Verhagen and Wagenmakers (2014), it was limited to the case of t-tests. In this article, the Replication Bayes factor is extended to F-tests in multigroup, fixed-effect ANOVA designs. Simulations and examples are presented to facilitate the understanding and to demonstrate the usefulness of this approach. Finally, the Replication Bayes factor is compared to other Bayesian and frequentist approaches and discussed in the context of replication attempts. R code to calculate Replication Bayes factors and to reproduce the examples in the article is available at https://osf.io/jv39h/.

[1]  S. Chib,et al.  Marginal Likelihood From the Metropolis–Hastings Output , 2001 .

[2]  Jacob Cohen Statistical Power Analysis for the Behavioral Sciences , 1969, The SAGE Encyclopedia of Research Design.

[3]  Taylor Francis Online,et al.  The American statistician , 1947 .

[4]  U. Simonsohn Small Telescopes , 2014, Psychological science.

[5]  Thomas Goschke,et al.  Conflict-Triggered Goal Shielding , 2008, Psychological science.

[6]  S. E. Ahmed,et al.  Markov Chain Monte Carlo: Stochastic Simulation for Bayesian Inference , 2008, Technometrics.

[7]  Jeffrey N. Rouder,et al.  Default Bayes factors for ANOVA designs , 2012 .

[8]  D. Rubin,et al.  Contrasts and Effect Sizes in Behavioral Research: A Correlational Approach , 1999 .

[9]  Jie W Weiss,et al.  Bayesian Statistical Inference for Psychological Research , 2008 .

[10]  A. Gelman,et al.  Stan , 2015 .

[11]  H. Jeffreys,et al.  Theory of probability , 1896 .

[12]  Felix D. Schönbrodt,et al.  Sequential Hypothesis Testing With Bayes Factors: Efficiently Testing Mean Differences , 2017, Psychological methods.

[13]  H. Pashler,et al.  Editors’ Introduction to the Special Section on Replicability in Psychological Science , 2012, Perspectives on psychological science : a journal of the Association for Psychological Science.

[14]  G. Cumming,et al.  Researchers misunderstand confidence intervals and standard error bars. , 2005, Psychological methods.

[15]  Michael C. Frank,et al.  Response to Comment on “Estimating the reproducibility of psychological science” , 2016, Science.

[16]  David B. Dunson,et al.  Bayesian Data Analysis , 2010 .

[17]  Brian A. Nosek,et al.  Recommendations for Increasing Replicability in Psychology † , 2013 .

[18]  Charles S. Bos A Comparison of Marginal Likelihood Computation Methods , 2002, COMPSTAT.

[19]  N. Lazar,et al.  The ASA Statement on p-Values: Context, Process, and Purpose , 2016 .

[20]  David S. Leslie,et al.  A tutorial on bridge sampling , 2017, Journal of mathematical psychology.

[21]  Bradley P. Carlin,et al.  Markov Chain Monte Carlo Methods for Computing Bayes Factors , 2001 .

[22]  S. Chib,et al.  Understanding the Metropolis-Hastings Algorithm , 1995 .

[23]  John K Kruschke,et al.  Bayesian data analysis. , 2010, Wiley interdisciplinary reviews. Cognitive science.

[24]  David B. Hitchcock,et al.  A History of the Metropolis–Hastings Algorithm , 2003 .

[25]  Richard McElreath,et al.  Statistical Rethinking: A Bayesian Course with Examples in R and Stan , 2015 .

[26]  Ian M. Handley,et al.  Increasing and decreasing motor and cognitive output: a model of general action and inaction goals. , 2008, Journal of personality and social psychology.

[27]  Jeffrey N. Rouder,et al.  The philosophy of Bayes’ factors and the quantification of statistical evidence , 2016 .

[28]  Timothy D. Wilson,et al.  Comment on “Estimating the reproducibility of psychological science” , 2016, Science.

[29]  G. Cumming Understanding the New Statistics: Effect Sizes, Confidence Intervals, and Meta-Analysis , 2011 .

[30]  John K. Kruschke,et al.  Chapter 2 – Introduction: Credibility, Models, and Parameters , 2015 .

[31]  Eric-Jan Wagenmakers,et al.  Replication Bayes factors from evidence updating , 2018, Behavior Research Methods.

[32]  Michael C. Frank,et al.  Estimating the reproducibility of psychological science , 2015, Science.

[33]  Leif D. Nelson,et al.  False-Positive Psychology , 2011, Psychological science.

[34]  Jacques Poitevineau,et al.  Implementing Bayesian predictive procedures: The K-prime and K-square distributions , 2010, Comput. Stat. Data Anal..

[35]  Daniël Lakens,et al.  Calculating and reporting effect sizes to facilitate cumulative science: a practical primer for t-tests and ANOVAs , 2013, Front. Psychol..

[36]  J. Rouder,et al.  Default Bayes Factors for Model Selection in Regression , 2012, Multivariate behavioral research.

[37]  J. Vandekerckhove,et al.  A Bayesian Perspective on the Reproducibility Project: Psychology , 2016, PloS one.

[38]  D. Bakan,et al.  The test of significance in psychological research. , 1966, Psychological bulletin.

[39]  D. Lindley,et al.  The Analysis of Experimental Data: The Appreciation of Tea and Wine , 1993 .

[40]  Bruno Lecoutre,et al.  Two useful distributions for Bayesian predictive procedures under normal models , 1999 .

[41]  F. Dablander,et al.  How to become a Bayesian in eight easy steps: An annotated reading list , 2018, Psychonomic bulletin & review.

[42]  Jeffrey R. Spies,et al.  The Replication Recipe: What Makes for a Convincing Replication? , 2014 .

[43]  James O. Berger,et al.  Rejection odds and rejection ratios: A proposal for statistical practice in testing hypotheses , 2015, Journal of mathematical psychology.

[44]  Gerd Gigerenzer,et al.  Do Studies of Statistical Power Have an Effect on the Power of Studies? , 2004 .

[45]  Samantha F. Anderson,et al.  There's more than one way to conduct a replication study: Beyond statistical significance. , 2016, Psychological methods.

[46]  Brian A. Nosek,et al.  Power failure: why small sample size undermines the reliability of neuroscience , 2013, Nature Reviews Neuroscience.

[47]  J. Ioannidis,et al.  Empirical assessment of published effect sizes and power in the recent cognitive neuroscience and psychology literature , 2017, PLoS biology.

[48]  A. Gelman,et al.  The Difference Between “Significant” and “Not Significant” is not Itself Statistically Significant , 2006 .

[49]  Xiao-Li Meng,et al.  SIMULATING RATIOS OF NORMALIZING CONSTANTS VIA A SIMPLE IDENTITY: A THEORETICAL EXPLORATION , 1996 .

[50]  G. Cumming Replication and p Intervals: p Values Predict the Future Only Vaguely, but Confidence Intervals Do Much Better , 2008, Perspectives on psychological science : a journal of the Association for Psychological Science.

[51]  G. Cumming,et al.  A Primer on the Understanding, Use, and Calculation of Confidence Intervals that are Based on Central and Noncentral Distributions , 2001 .

[52]  S. Maxwell,et al.  Is psychology suffering from a replication crisis? What does "failure to replicate" really mean? , 2015, The American psychologist.

[53]  Scott D. Brown,et al.  A purely confirmatory replication study of structural brain-behavior correlations , 2015, Cortex.

[54]  R. Nickerson,et al.  Null hypothesis significance testing: a review of an old and continuing controversy. , 2000, Psychological methods.

[55]  Z. Dienes How Bayes factors change scientific practice , 2016 .

[56]  Z. Dienes Bayesian Versus Orthodox Statistics: Which Side Are You On? , 2011, Perspectives on psychological science : a journal of the Association for Psychological Science.

[57]  J. Bargh,et al.  Keeping One's Distance , 2008, Psychological science.

[58]  Y. Ritov,et al.  Response to the ASA’s Statement on p-Values: Context, Process, and Purpose , 2017 .

[59]  J. H. Steiger,et al.  Beyond the F test: Effect size confidence intervals and tests of close fit in the analysis of variance and contrast analysis. , 2004, Psychological methods.

[60]  Eric-Jan Wagenmakers,et al.  Bayesian tests to quantify the result of a replication attempt. , 2014, Journal of experimental psychology. General.

[61]  Jeffrey N. Rouder,et al.  Bayesian t tests for accepting and rejecting the null hypothesis , 2009, Psychonomic bulletin & review.

[62]  Felix D. Schönbrodt,et al.  A Bayesian bird's eye view of ‘Replications of important results in social psychology’ , 2017, Royal Society Open Science.