Safeguard Power as a Protection Against Imprecise Power Estimates

An essential first step in planning a confirmatory or a replication study is to determine the sample size necessary to draw statistically reliable inferences using power analysis. A key problem, however, is that what is available is the sample-size estimate of the effect size, and its use can lead to severely underpowered studies when the effect size is overestimated. As a potential remedy, we introduce safeguard power analysis, which uses the uncertainty in the estimate of the effect size to achieve a better likelihood of correctly identifying the population effect size. Using a lower-bound estimate of the effect size, in turn, allows researchers to calculate a sample size for a replication study that helps protect it from being underpowered. We show that in most common instances, compared with nominal power, safeguard power is higher whereas standard power is lower. We additionally recommend the use of safeguard power analysis to evaluate the strength of the evidence provided by the original study.

[1]  F. Thoemmes,et al.  Continuously Cumulating Meta-Analysis and Replicability , 2014, Perspectives on psychological science : a journal of the Association for Psychological Science.

[2]  C. Ferguson,et al.  Publication bias in psychological science: prevalence, methods for identifying and controlling, and implications for the use of meta-analyses. , 2012, Psychological methods.

[3]  E. Wagenmakers,et al.  Why psychologists must change the way they analyze their data: the case of psi: comment on Bem (2011). , 2011, Journal of personality and social psychology.

[4]  Etienne P LeBel,et al.  Heightened Sensitivity to Temperature Cues in Individuals With High Anxious Attachment , 2013, Psychological science.

[5]  Joseph R. Rausch,et al.  Sample size planning for the standardized mean difference: accuracy in parameter estimation via narrow confidence intervals. , 2006, Psychological methods.

[6]  Brian A. Nosek,et al.  An Open, Large-Scale, Collaborative Effort to Estimate the Reproducibility of Psychological Science , 2012, Perspectives on psychological science : a journal of the Association for Psychological Science.

[7]  David Kaplan,et al.  The Sage handbook of quantitative methodology for the social sciences , 2004 .

[8]  Leif D. Nelson,et al.  False-Positive Psychology , 2011, Psychological science.

[9]  R. Rosenthal,et al.  Statistical Procedures and the Justification of Knowledge in Psychological Science , 1989 .

[10]  Joseph R. Rausch,et al.  Sample size planning for statistical power and accuracy in parameter estimation. , 2008, Annual review of psychology.

[11]  Leland Wilkinson,et al.  Statistical Methods in Psychology Journals Guidelines and Explanations , 2005 .

[12]  D. Fanelli “Positive” Results Increase Down the Hierarchy of the Sciences , 2010, PloS one.

[13]  B. Newell,et al.  Priming Intelligent Behavior: An Elusive Phenomenon , 2013, PloS one.

[14]  Axel Cleeremans,et al.  Behavioral Priming: It's All in the Mind, but Whose Mind? , 2012, PloS one.

[15]  G. Cumming Understanding the New Statistics: Effect Sizes, Confidence Intervals, and Meta-Analysis , 2011 .

[16]  D. Lakens,et al.  Sailing From the Seas of Chaos Into the Corridor of Stability , 2014, Perspectives on psychological science : a journal of the Association for Psychological Science.

[17]  Ken Kelley,et al.  Sample size for multiple regression: obtaining regression coefficients that are accurate, not simply significant. , 2003, Psychological methods.

[18]  Jacob Cohen Statistical Power Analysis for the Behavioral Sciences , 1969, The SAGE Encyclopedia of Research Design.

[19]  Brian A. Nosek,et al.  Recommendations for Increasing Replicability in Psychology † , 2013 .

[20]  R. Nickerson,et al.  Null hypothesis significance testing: a review of an old and continuing controversy. , 2000, Psychological methods.

[21]  Leif D. Nelson,et al.  Journal of Personality and Social Psychology Correcting the Past : Failures to Replicate Psi , 2012 .

[22]  D. C. Howell Statistical Methods for Psychology , 1987 .

[23]  J. Wicherts,et al.  Willingness to Share Research Data Is Related to the Strength of the Evidence and the Quality of Reporting of Statistical Results , 2011, PloS one.

[24]  Han L. J. van der Maas,et al.  Science Perspectives on Psychological an Agenda for Purely Confirmatory Research on Behalf Of: Association for Psychological Science , 2022 .

[25]  G. Loewenstein,et al.  Measuring the Prevalence of Questionable Research Practices With Incentives for Truth Telling , 2012, Psychological science.

[26]  Jacob Cohen,et al.  The statistical power of abnormal-social psychological research: a review. , 1962, Journal of abnormal and social psychology.

[27]  J. Ioannidis,et al.  Why Current Publication Practices May Distort Science , 2008, PLoS medicine.

[28]  J. Kruschke What to believe: Bayesian methods for data analysis , 2010, Trends in Cognitive Sciences.

[29]  Rogier A. Kievit,et al.  Bayesians Caught Smuggling Priors Into Rotterdam Harbor , 2011, Perspectives on psychological science : a journal of the Association for Psychological Science.

[30]  Felix D. Schönbrodt,et al.  At what sample size do correlations stabilize , 2013 .

[31]  D. C. Howell Statistical methods for psychology, 3rd ed. , 1992 .

[32]  J. Wicherts,et al.  The Rules of the Game Called Psychological Science , 2012, Perspectives on psychological science : a journal of the Association for Psychological Science.

[33]  Alison Ledgerwood,et al.  The trade-off between accuracy and precision in latent variable models of mediation processes. , 2011, Journal of personality and social psychology.

[34]  John K Kruschke,et al.  Bayesian Assessment of Null Values Via Parameter Estimation and Model Comparison , 2011, Perspectives on psychological science : a journal of the Association for Psychological Science.

[35]  Jacob Cohen,et al.  A power primer. , 1992, Psychological bulletin.

[36]  R. Frick,et al.  The appropriate use of null hypothesis testing. , 1996 .

[37]  J. Ioannidis,et al.  US studies may overestimate effect sizes in softer research , 2013, Proceedings of the National Academy of Sciences.

[38]  Brian A. Nosek,et al.  Power failure: why small sample size undermines the reliability of neuroscience , 2013, Nature Reviews Neuroscience.

[39]  E. Masicampo,et al.  A peculiar prevalence of p values just below .05 , 2012, Quarterly journal of experimental psychology.

[40]  Jeffrey R. Spies,et al.  The Replication Recipe: What Makes for a Convincing Replication? , 2014 .

[41]  Theodor D. Sterling,et al.  Publication decisions revisited: the effect of the outcome of statistical tests on the decision to p , 1995 .

[42]  U. Simonsohn Small Telescopes , 2014, Psychological science.

[43]  G. Gigerenzer,et al.  The null ritual : What you always wanted to know about significance testing but were afraid to ask , 2004 .

[44]  H. Pashler,et al.  Editors’ Introduction to the Special Section on Replicability in Psychological Science , 2012, Perspectives on psychological science : a journal of the Association for Psychological Science.

[45]  Brian A. Nosek,et al.  Scientific Utopia , 2012, Perspectives on psychological science : a journal of the Association for Psychological Science.