How Many Participants Do We Have to Include in Properly Powered Experiments? A Tutorial of Power Analysis with Reference Tables

Given that an effect size of d = .4 is a good first estimate of the smallest effect size of interest in psychological research, we already need over 50 participants for a simple comparison of two within-participants conditions if we want to run a study with 80% power. This is more than current practice. In addition, as soon as a between-groups variable or an interaction is involved, numbers of 100, 200, and even more participants are needed. As long as we do not accept these facts, we will keep on running underpowered studies with unclear results. Addressing the issue requires a change in the way research is evaluated by supervisors, examiners, reviewers, and editors. The present paper describes reference numbers needed for the designs most often used by psychologists, including single-variable between-groups and repeated-measures designs with two and three levels, two-factor designs involving two repeated-measures variables or one between-groups variable and one repeated-measures variable (split-plot design). The numbers are given for the traditional, frequentist analysis with p < .05 and Bayesian analysis with BF > 10. These numbers provide researchers with a standard to determine (and justify) the sample size of an upcoming study. The article also describes how researchers can improve the power of their study by including multiple observations per condition per participant.

[1]  A Pollatsek,et al.  On the use of counterbalanced designs in cognitive research: a suggestion for a better and more powerful analysis. , 1995, Journal of experimental psychology. Learning, memory, and cognition.

[2]  Timothy D. Wilson,et al.  Just think: The challenges of the disengaged mind , 2014, Science.

[3]  Gilles E. Gignac,et al.  Effect size guidelines for individual differences researchers , 2016 .

[4]  Brian A. Nosek,et al.  Evaluating the replicability of social science experiments in Nature and Science between 2010 and 2015 , 2018, Nature Human Behaviour.

[5]  David J. Hauser,et al.  Attentive Turkers: MTurk participants perform better on online attention checks than do subject pool participants , 2015, Behavior Research Methods.

[6]  Susann Fiedler,et al.  Badges to Acknowledge Open Practices: A Simple, Low-Cost, Effective Method for Increasing Transparency , 2016, PLoS biology.

[7]  Edgar Erdfelder,et al.  G*Power 3: A flexible statistical power analysis program for the social, behavioral, and biomedical sciences , 2007, Behavior research methods.

[8]  Jonathan Robinson,et al.  TurkPrime.com: A versatile crowdsourcing data acquisition platform for the behavioral sciences , 2016, Behavior Research Methods.

[9]  P. Verhaeghen Aging and vocabulary scores: a meta-analysis. , 2003, Psychology and aging.

[10]  Jamie I. D. Campbell,et al.  MorePower 6.0 for ANOVA with relational confidence intervals and Bayesian analysis , 2012, Behavior research methods.

[11]  Yunyun Jiang,et al.  Methods for Analysis of Pre-Post Data in Clinical Research: A Comparison of Five Common Methods , 2017, Journal of biometrics & biostatistics.

[12]  Robert Lew,et al.  Using power analysis to estimate appropriate sample size , 2014 .

[13]  Julius Sim,et al.  Bias, precision and statistical power of analysis of covariance in the analysis of randomized trials with baseline imbalance: a simulation study , 2014, BMC Medical Research Methodology.

[14]  Jesse J. Chandler,et al.  Inside the Turk , 2014 .

[15]  G. Loewenstein,et al.  Measuring the Prevalence of Questionable Research Practices With Incentives for Truth Telling , 2012, Psychological science.

[16]  Winter A. Mason,et al.  Internet research in psychology. , 2015, Annual review of psychology.

[17]  John P. A. Ioannidis,et al.  A manifesto for reproducible science , 2017, Nature Human Behaviour.

[18]  Jeffrey N. Rouder,et al.  Bayesian t tests for accepting and rejecting the null hypothesis , 2009, Psychonomic bulletin & review.

[19]  Hristos Doucouliagos,et al.  What Meta-Analyses Reveal About the Replicability of Psychological Research , 2018, Psychological bulletin.

[20]  Anton Kühberger,et al.  Publication Bias in Psychology: A Diagnosis Based on the Correlation between Effect Size and Sample Size , 2014, PloS one.

[21]  Jacob Cohen,et al.  The statistical power of abnormal-social psychological research: a review. , 1962, Journal of abnormal and social psychology.

[22]  James G. Field,et al.  Correlational effect size benchmarks. , 2015, The Journal of applied psychology.

[23]  J. Lachin Introduction to sample size determination and power analysis for clinical trials. , 1981, Controlled clinical trials.

[24]  S. Maxwell The persistence of underpowered studies in psychological research: causes, consequences, and remedies. , 2004, Psychological methods.

[25]  G. Francis Publication bias and the failure of replication in experimental psychology , 2012, Psychonomic bulletin & review.

[26]  K. Murphy,et al.  Statistical Power Analysis: A Simple and General Model for Traditional and Modern Hypothesis Tests, Second Ediction , 1998 .

[27]  D Stephen Lindsay,et al.  Sharing Data and Materials in Psychological Science , 2017, Psychological science.

[28]  Jeffrey Bowers,et al.  Article Commentary: On the Persistence of Low Power in Psychological Science , 2014, Quarterly journal of experimental psychology.

[29]  Julia M. Haaf,et al.  Power, Dominance, and Constraint: A Note on the Appeal of Different Design Traditions , 2018 .

[30]  J. Fleiss,et al.  Intraclass correlations: uses in assessing rater reliability. , 1979, Psychological bulletin.

[31]  Gordon P. Brooks,et al.  The PEAR Method for Sample Sizes in Multiple Linear Regression , 2012 .

[32]  H. Kraemer,et al.  Caution regarding the use of pilot studies to guide power calculations for study proposals. , 2006, Archives of general psychiatry.

[33]  Jennifer J. Richler,et al.  Effect size estimates: current use, calculations, and interpretation. , 2012, Journal of experimental psychology. General.

[34]  Thomas Boraud,et al.  Low statistical power in biomedical science: a review of three human research domains , 2017, Royal Society Open Science.

[35]  Sarah Depaoli,et al.  Improving Transparency and Replication in Bayesian Statistics: The WAMBS-Checklist , 2017, Psychological methods.

[36]  D Stephen Lindsay,et al.  Replication in Psychological Science , 2015, Psychological science.

[37]  J. Ioannidis,et al.  When Null Hypothesis Significance Testing Is Unsuitable for Research: A Reassessment , 2016, bioRxiv.

[38]  Randall K. Jamieson,et al.  Registered Reports. , 2019, Canadian journal of experimental psychology = Revue canadienne de psychologie experimentale.

[39]  G. Smith,et al.  Bias in meta-analysis detected by a simple, graphical test , 1997, BMJ.

[40]  Torrin M. Liddell,et al.  The Bayesian New Statistics: Hypothesis testing, estimation, meta-analysis, and power analysis from a Bayesian perspective , 2016, Psychonomic bulletin & review.

[41]  Justin D. Smith,et al.  Single-case experimental designs: a systematic review of published research and current standards. , 2012, Psychological methods.

[42]  Simine Vazire,et al.  The N-Pact Factor: Evaluating the Quality of Empirical Journals with Respect to Sample Size and Statistical Power , 2014, PloS one.

[43]  Hunter A. Myüz,et al.  The sampling precision of research in five major areas of psychology , 2018, Behavior Research Methods.

[44]  Martin R. Vasilev,et al.  Auditory Distraction During Reading: A Bayesian Meta-Analysis of a Continuing Controversy , 2018, Perspectives on psychological science : a journal of the Association for Psychological Science.

[45]  Etienne P. LeBel,et al.  A Unified Framework to Quantify the Credibility of Scientific Findings , 2018, Advances in Methods and Practices in Psychological Science.

[46]  C. Chambers Registered Reports: A new publishing initiative at Cortex , 2013, Cortex.

[47]  Ken Kelley,et al.  Sample-Size Planning for More Accurate Statistical Power: A Method Adjusting Sample Effect Sizes for Publication Bias and Uncertainty , 2017, Psychological science.

[48]  Patrick Onghena,et al.  One by One: Accumulating Evidence by using Meta-Analytical Procedures for Single-Case Experiments , 2017, Brain Impairment.

[49]  Leif D. Nelson,et al.  False-Positive Psychology , 2011, Psychological science.

[50]  P. Lachenbruch Statistical Power Analysis for the Behavioral Sciences (2nd ed.) , 1989 .

[51]  Daniël Lakens,et al.  Calculating and reporting effect sizes to facilitate cumulative science: a practical primer for t-tests and ANOVAs , 2013, Front. Psychol..

[52]  Benjamin E Hilbig,et al.  Reaction time effects in lab- versus Web-based research: Experimental evidence , 2016, Behavior research methods.

[53]  Jelte M. Wicherts,et al.  Researchers’ Intuitions About Power in Psychological Research , 2016, Psychological science.

[54]  D. Heisey,et al.  The Abuse of Power , 2001 .

[55]  Marc Brysbaert,et al.  Power Analysis and Effect Size in Mixed Effects Models: A Tutorial , 2018, Journal of cognition.

[56]  Reginald B. Adams,et al.  Many Labs 2: Investigating Variation in Replicability Across Sample and Setting , 2018 .

[57]  David Colquhoun,et al.  An investigation of the false discovery rate and the misinterpretation of p-values , 2014, Royal Society Open Science.

[58]  Jeffrey N Rouder,et al.  Bayesian Analysis of Factorial Designs , 2017, Psychological methods.

[59]  Markus Brauer,et al.  Buy Three but Get Only Two: the Smallest Effect in a 2 × 2 Anova Is Always Uninterpretable Two Examples from Real Research , 2022 .

[60]  D. Dimitrov,et al.  Pretest-posttest designs and measurement of change. , 2003, Work.

[61]  Christopher D. Chambers,et al.  Redefine statistical significance , 2017, Nature Human Behaviour.

[62]  S. Maxwell Sample size and multiple regression analysis. , 2000, Psychological methods.

[63]  James E Pustejovsky,et al.  Analysis and meta-analysis of single-case designs with a standardized mean difference statistic: a primer and applications. , 2014, Journal of school psychology.

[64]  Richard McElreath,et al.  The natural selection of bad science , 2016, Royal Society Open Science.

[65]  Leland Wilkinson,et al.  Statistical Methods in Psychology Journals Guidelines and Explanations , 2005 .

[66]  Elizabeth Gilbert,et al.  Reproducibility Project: Results (Part of symposium called "The Reproducibility Project: Estimating the Reproducibility of Psychological Science") , 2014 .

[67]  L. Cronbach The two disciplines of scientific psychology. , 1957 .

[68]  D. Loiselle,et al.  A counterview of ‘An investigation of the false discovery rate and the misinterpretation of p-values’ by Colquhoun (2014) , 2015, Royal Society Open Science.

[69]  R. DeShon,et al.  Combining effect size estimates in meta-analysis with repeated measures and independent-groups designs. , 2002 .

[70]  Jeffrey N. Rouder,et al.  The fallacy of placing confidence in confidence intervals , 2015, Psychonomic bulletin & review.

[71]  Giulio Costantini,et al.  A Practical Primer To Power Analysis for Simple Experimental Designs , 2018 .

[72]  D. Lakens,et al.  When power analyses based on pilot data are biased: Inaccurate effect size estimators and follow-up bias , 2018 .

[73]  Felix D. Schönbrodt,et al.  Sequential Hypothesis Testing With Bayes Factors: Efficiently Testing Mean Differences , 2017, Psychological methods.

[74]  Nicholas P. Holmes,et al.  Justify your alpha , 2018, Nature Human Behaviour.

[75]  Katherine A. Rawson,et al.  Why Testing Improves Memory: Mediator Effectiveness Hypothesis , 2010, Science.

[76]  Valen E. Johnson,et al.  On the Reproducibility of Psychological Science , 2017, Journal of the American Statistical Association.

[77]  D. Lakens Performing High-Powered Studies Efficiently with Sequential Analyses , 2014 .

[78]  Russell V. Lenth,et al.  Java Applets for Power and Sample Size , 2015 .

[79]  Alexander Etz,et al.  Introduction to Bayesian Inference for Psychology , 2018, Psychonomic bulletin & review.

[80]  G. Cumming,et al.  The New Statistics , 2014, Psychological science.

[81]  Amy Perfors,et al.  The “Small World of Words” English word association norms for over 12,000 cue words , 2018, Behavior Research Methods.

[82]  S Duval,et al.  Trim and Fill: A Simple Funnel‐Plot–Based Method of Testing and Adjusting for Publication Bias in Meta‐Analysis , 2000, Biometrics.

[83]  M. Birnbaum Human research and data collection via the internet. , 2004, Annual review of psychology.

[84]  Marc Brysbaert,et al.  Cognitive Profile of Students Who Enter Higher Education with an Indication of Dyslexia , 2012, PloS one.

[85]  Brian A. Nosek,et al.  Registered Reports A Method to Increase the Credibility of Published Results , 2014 .

[86]  M. Edwards,et al.  Academic Research in the 21st Century: Maintaining Scientific Integrity in a Climate of Perverse Incentives and Hypercompetition , 2017, Environmental engineering science.

[87]  Jacob Cohen,et al.  A power primer. , 1992, Psychological bulletin.

[88]  Uri Simonsohn,et al.  [17] No-way Interactions , 2015 .

[89]  S. Gosling,et al.  Should we trust web-based studies? A comparative analysis of six preconceptions about internet questionnaires. , 2004, The American psychologist.

[90]  Gerd Gigerenzer,et al.  Surrogate Science , 2015 .

[91]  Anne M. Scheel,et al.  Equivalence Testing for Psychological Research: A Tutorial , 2018, Advances in Methods and Practices in Psychological Science.

[92]  Elisabeth Dévière,et al.  Analyzing linguistic data: a practical introduction to statistics using R , 2009 .

[93]  Z. Dienes How Bayes factors change scientific practice , 2016 .

[94]  Jeffrey N. Rouder,et al.  Bayesian inference for psychology. Part II: Example applications with JASP , 2017, Psychonomic Bulletin & Review.

[95]  Rolf A. Zwaan,et al.  Participant Nonnaiveté and the reproducibility of cognitive psychology , 2017, Psychonomic bulletin & review.

[96]  Daniel J. Mundfrom,et al.  Sample Sizes When Using Multiple Linear Regression for Prediction , 2008 .

[97]  J. Ioannidis,et al.  Empirical assessment of published effect sizes and power in the recent cognitive neuroscience and psychology literature , 2017, PLoS biology.

[98]  Marcus R. Munafò,et al.  Current Incentives for Scientists Lead to Underpowered Studies with Erroneous Conclusions , 2016, PLoS biology.