Power to the People: Power, Negative Results and Sample Size.

The practical application of statistical power is becoming an increasingly important part of experimental design, data analysis, and reporting. Power is essential to estimating sample size as part of planning studies and obtaining ethical approval for them. Furthermore, power is essential for publishing and interpreting negative results. In this manuscript, we review what power is, how it can be calculated, and reporting recommendations if a null result is found. Power can be thought of as reflecting the signal to noise ratio of an experiment. The conventional wisdom that statistical power is driven by sample size (which increases the signal in the data), while true, is a misleading oversimplification. Relatively little discussion covers the use of experimental designs which control and reduce noise. Even small improvements in experimentaldesign can achieve high power at much lower sample sizes than (for instance) a simple t test. Failure to report experimentaldesign or the proposed statistical test on animal care and use protocols creates a dilemma for IACUCs, because it is unknownwhether sample size has been correctly calculated. Traditional power calculations, which are primarily provided for animal number justifications, are only available for simple, yet low powered, experimental designs, such as paired t tests. Thus, in most controlled experimental studies, the only analyses for which power can be calculated are those that inheriently have low statistical power; these analyses should not be used because they require more animals than necessary. We provide suggestions for more powerful experimental designs (such as randomized block and factorial designs) that increase power, and we describe methods to easily calculate sample size for these designs that are suitable for IACUC number justifications. Finally we also provide recommendations for reporting negative results, so that readers and reviewers can determine whetheran experiment had sufficient power. The use of more sophisticated designs in animal experiments will inevitably improve power, reproducibility, and reduce animal use.

[1]  I. Cuthill,et al.  Reporting : The ARRIVE Guidelines for Reporting Animal Research , 2010 .

[2]  Hanno Würbel,et al.  Behaviour and the standardization fallacy , 2000, Nature Genetics.

[3]  R. Gosselin Guidelines on statistics for researchers using laboratory animals: the essentials , 2018, Laboratory animals.

[4]  S. Goodman,et al.  The Use of Predicted Confidence Intervals When Planning Experiments and the Misuse of Power When Interpreting Results , 1994, Annals of Internal Medicine.

[5]  J. Garner The significance of meaning: why do over 90% of behavioral neuroscience results fail to translate to humans, and what can we do to fix it? , 2014, ILAR journal.

[6]  M. H. Ensom,et al.  Post Hoc Power Analysis: An Idea Whose Time Has Passed? , 2001, Pharmacotherapy.

[7]  K. J. Parker,et al.  A randomized placebo-controlled pilot trial shows that intranasal vasopressin improves social deficits in children with autism , 2019, Science Translational Medicine.

[8]  B. Berridge,et al.  The Role of the IACUC in the Design and Conduct of Animal Experiments that Contribute to Translational Success. , 2017, ILAR journal.

[9]  Leif Engqvist,et al.  Does systematic variation improve the reproducibility of animal experiments? , 2013, Nature Methods.

[10]  C. Begley,et al.  Drug development: Raise standards for preclinical cancer research , 2012, Nature.

[11]  Guy Beauchamp,et al.  ARRIVE has not ARRIVEd: Support for the ARRIVE (Animal Research: Reporting of in vivo Experiments) guidelines does not improve the reporting quality of papers in animal welfare, analgesia or anesthesia , 2018, PloS one.

[12]  C. Begley,et al.  Reproducibility: Six red flags for suspect work , 2013, Nature.

[13]  M. Krzywinski,et al.  Points of significance: Analysis of variance and blocking , 2014, Nature Methods.

[14]  Brianna N Gaskill,et al.  Introducing Therioepistemology: the study of how knowledge is gained from animal research , 2017, Lab Animal.

[15]  Joachim Kunert,et al.  Systematic variation improves reproducibility of animal experiments , 2010, Nature Methods.

[16]  Michael J Marino,et al.  The use and misuse of statistical methodologies in pharmacology research. , 2014, Biochemical pharmacology.

[17]  I. Cuthill,et al.  Effect size, confidence interval and statistical significance: a practical guide for biologists , 2007, Biological reviews of the Cambridge Philosophical Society.

[18]  Naomi S. Altman,et al.  Points of Significance: Nested designs , 2014, Nature Methods.

[19]  S. Helene Richter,et al.  Reply to: "Reanalysis of Richter et al. (2010) on reproducibility" , 2013, Nature Methods.

[20]  Gail M. Sullivan,et al.  Using Effect Size-or Why the P Value Is Not Enough. , 2012, Journal of graduate medical education.

[21]  H. Kraemer,et al.  Caution regarding the use of pilot studies to guide power calculations for study proposals. , 2006, Archives of general psychiatry.

[22]  J. Ioannidis Why Most Published Research Findings Are False , 2019, CHANCE.

[23]  H. Würbel,et al.  Refinement of experimental design and conduct in laboratory animal research. , 2014, ILAR journal.

[24]  J. Garner,et al.  The effect of early life experience, environment, and genetic factors on spontaneous home-cage aggression-related wounding in male C57BL/6 mice , 2017, Lab Animal.

[25]  Miguel A. Vadillo,et al.  Romance, risk, and replication: Can consumer choices and risk-taking be primed by mating motives? , 2015, Journal of experimental psychology. General.

[26]  Jing Liao,et al.  Protocol for a retrospective, controlled cohort study of the impact of a change in Nature journals’ editorial policy for life sciences research on the completeness of reporting study design and execution , 2016, Scientometrics.

[27]  Brian A. Nosek,et al.  Power failure: why small sample size undermines the reliability of neuroscience , 2013, Nature Reviews Neuroscience.

[28]  Chadi Touma,et al.  Effect of Population Heterogenization on the Reproducibility of Mouse Behavior: A Multi-Laboratory Study , 2011, PloS one.

[29]  J. Everitt The Future of Preclinical Animal Models in Pharmaceutical Discovery and Development , 2015, Toxicologic pathology.

[30]  M. Justice,et al.  ENU mutagenesis reveals that Notchless homolog 1 (Drosophila) affects Cdkn1a and several members of the Wnt pathway during murine pre-implantation development , 2012, BMC Genetics.

[31]  S. Helene Richter,et al.  Environmental standardization: cure or cause of poor reproducibility in animal experiments? , 2009, Nature Methods.

[32]  Martin Krzywinski,et al.  Points of Significance: Error bars , 2013, Nature Methods.

[33]  E. Wagenmakers,et al.  Erroneous analyses of interactions in neuroscience: a problem of significance , 2011, Nature Neuroscience.