Invited review: Reproducible research from noisy data: Revisiting key statistical principles for the animal sciences.

Reproducible results define the very core of scientific integrity in modern research. Yet, legitimate concerns have been raised about the reproducibility of research findings, with important implications for the advancement of science and for public support. With statistical practice increasingly becoming an essential component of research efforts across the sciences, this review article highlights the compelling role of statistics in ensuring that research findings in the animal sciences are reproducible-in other words, able to withstand close interrogation and independent validation. Statistics set a formal framework and a practical toolbox that, when properly implemented, can recover signal from noisy data. Yet, misconceptions and misuse of statistics are recognized as top contributing factors to the reproducibility crisis. In this article, we revisit foundational statistical concepts relevant to reproducible research in the context of the animal sciences, raise awareness on common statistical misuse undermining it, and outline recommendations for statistical practice. Specifically, we emphasize a keen understanding of the data generation process throughout the research endeavor, from thoughtful experimental design and randomization, through rigorous data analysis and inference, to careful wording in communicating research results to peer scientists and society in general. We provide a detailed discussion of core concepts in experimental design, including data architecture, experimental replication, and subsampling, and elaborate on practical implications for proper elicitation of the scope of reach of research findings. For data analysis, we emphasize proper implementation of mixed models, in terms of both distributional assumptions and specification of fixed and random effects to explicitly recognize multilevel data architecture. This is critical to ensure that experimental error for treatments of interest is properly recognized and inference is correctly calibrated. Inferential misinterpretations associated with use of P-values, both significant and not, are clarified, and problems associated with error inflation due to multiple comparisons and selective reporting are illustrated. Overall, we advocate for a responsible practice of statistics in the animal sciences, with an emphasis on continuing quantitative education and interdisciplinary collaboration between animal scientists and statisticians to maximize reproducibility of research findings.

[1]  N. Bello,et al.  Short communication: On recognizing the proper experimental unit in animal studies in the dairy sciences. , 2016, Journal of dairy science.

[2]  S. Martin,et al.  Veterinary Epidemiologic Research , 2009 .

[3]  D. Lazer,et al.  The Parable of Google Flu: Traps in Big Data Analysis , 2014, Science.

[4]  A. Gelman,et al.  The statistical crisis in science , 2014 .

[5]  D. E. Johnson,et al.  Analysis of Messy Data Volume I: Designed Experiments , 1985 .

[6]  M. Faires,et al.  Reporting of methodological features in observational studies of pre-harvest food safety. , 2011, Preventive veterinary medicine.

[7]  M. Borenstein,et al.  Publication Bias in Meta-Analysis: Prevention, Assessment and Adjustments , 2006 .

[8]  N. Bello,et al.  Efficacy of a vaccine and a direct-fed microbial against fecal shedding of Escherichia coli O157:H7 in a randomized pen-level field trial of commercial feedlot cattle. , 2012, Vaccine.

[9]  Ned Glick,et al.  Data Mining and Knowledge Discovery in Databases – An Overview , 1999 .

[10]  J M Sargeant,et al.  Methodological quality and completeness of reporting in clinical trials conducted in livestock species. , 2009, Preventive veterinary medicine.

[11]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[12]  J. Ridley,et al.  An unexpected influence of widely used significance thresholds on the distribution of reported P‐values , 2007, Journal of evolutionary biology.

[13]  Leif D. Nelson,et al.  False-Positive Psychology , 2011, Psychological science.

[14]  R. Kuehl Design of Experiments: Statistical Principles of Research Design and Analysis , 1999 .

[15]  Nora M. Bello,et al.  Ordinary Least Squares Regression of Ordered Categorical Data: Inferential Implications for Practice , 2011 .

[16]  R. Mclean,et al.  A Unified Approach to Mixed Linear Models , 1991 .

[17]  E. Gbur Analysis of Generalized Linear Mixed Models in the Agricultural and Natural Resources Sciences , 2020 .

[18]  Michael C. Frank,et al.  Estimating the reproducibility of psychological science , 2015, Science.

[19]  Charles E. Gates What Really is Experimental Error in Block Designs , 1995 .

[20]  N R St-Pierre,et al.  Invited review: Integrating quantitative findings from multiple studies using mixed model methodology. , 2001, Journal of dairy science.

[21]  J. Berger,et al.  Testing a Point Null Hypothesis: The Irreconcilability of P Values and Evidence , 1987 .

[22]  Statistics in a Horticultural Journal: Problems and Solutions , 2016 .

[23]  G. Naik Scientists' Elusive Goal: Reproducing Study Results , 2011 .

[24]  C. Ball,et al.  Repeatability of published microarray gene expression analyses , 2009, Nature Genetics.

[25]  Christopher D. Chambers,et al.  Redefine statistical significance , 2017, Nature Human Behaviour.

[26]  David B. Dunson,et al.  Bayesian Data Analysis , 2010 .

[27]  J. Ioannidis Why Most Published Research Findings Are False , 2019, CHANCE.

[28]  R V Lenth,et al.  Statistical power calculations. , 2007, Journal of animal science.

[29]  M. Cowles,et al.  On the Origins of the .05 Level of Statistical Significance , 1982 .

[30]  Alex Reinhart Statistics Done Wrong: The Woefully Complete Guide , 2015 .

[31]  F. Yates THE RECOVERY OF INTER-BLOCK INFORMATION IN BALANCED INCOMPLETE BLOCK DESIGNS , 1940 .

[32]  N. St-Pierre Design and analysis of pen studies in the animal sciences. , 2007, Journal of dairy science.

[33]  Christopher Winship,et al.  Counterfactuals and Causal Inference: Methods and Principles for Social Research , 2007 .

[34]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[35]  R. Tempelman Addressing scope of inference for global genetic evaluation of livestock , 2010 .

[36]  R. Fisher Statistical methods for research workers , 1927, Protoplasma.

[37]  R. Tempelman Assessing statistical precision, power, and robustness of alternative experimental designs for two color microarray platforms based on mixed effects models. , 2005, Veterinary immunology and immunopathology.

[38]  Richard Phillips Feynman,et al.  Cargo Cult Science , 1974 .

[39]  W. Stroup Generalized Linear Mixed Models: Modern Concepts, Methods and Applications , 2012 .

[40]  D. Renter,et al.  Using Feedlot Operational Data to Make Valid Conclusions for Improving Health Management. , 2015, The Veterinary clinics of North America. Food animal practice.

[41]  H. Erb Changing expectations: Do journals drive methodological changes? Should they? , 2010, Preventive veterinary medicine.

[42]  R J Tempelman,et al.  Experimental design and statistical methods for classical and bioequivalence hypothesis testing with an application to dairy nutrition studies. , 2004, Journal of animal science.

[43]  R J Tempelman,et al.  Invited review: assessing experimental designs for research conducted on commercial dairies. , 2009, Journal of dairy science.

[44]  R. Nuzzo How scientists fool themselves – and how they can stop , 2015, Nature.

[45]  S. Goodman A dirty dozen: twelve p-value misconceptions. , 2008, Seminars in hematology.

[46]  G. Rosa,et al.  Breeding and Genetics Symposium: inferring causal effects from observational data in livestock. , 2013, Journal of animal science.

[47]  G. Robinson That BLUP is a Good Thing: The Estimation of Random Effects , 1991 .

[48]  C. Begley,et al.  Drug development: Raise standards for preclinical cancer research , 2012, Nature.

[49]  Ramon C. Littell,et al.  SAS for Linear Models , 2002 .

[50]  Michael H. Kutner Applied Linear Statistical Models , 1974 .

[51]  R. Wolfinger,et al.  SAS for Mixed Models , 2018 .

[52]  M. Wiltbank,et al.  Treatment of cycling and noncycling lactating dairy cows with progesterone during Ovsynch. , 2006, Journal of dairy science.

[53]  Clive W. J. Granger,et al.  Extracting information from mega‐panels and high‐frequency data , 2008 .

[54]  N. Bello,et al.  Effects of amino acids and energy intake during late gestation of high-performing gilts and sows on litter and reproductive performance under commercial conditions. , 2016, Journal of animal science.

[55]  M. J. Laan,et al.  Targeted Learning: Causal Inference for Observational and Experimental Data , 2011 .

[56]  M. Baker 1,500 scientists lift the lid on reproducibility , 2016, Nature.

[57]  J. Ioannidis,et al.  Reproducibility in Science: Improving the Standard for Basic and Preclinical Research , 2015, Circulation research.

[58]  D. Heisey,et al.  The Abuse of Power , 2001 .

[59]  J. E. Hirsch,et al.  An index to quantify an individual's scientific research output , 2005, Proc. Natl. Acad. Sci. USA.

[60]  I. Cuthill,et al.  Survey of the Quality of Experimental Design, Statistical Analysis and Reporting of Research Using Animals , 2009, PloS one.

[61]  N. Bello,et al.  Detection of anovulation by heatmount detectors and transrectal ultrasonography before treatment with progesterone in a timed insemination protocol. , 2008, Journal of dairy science.

[62]  Christine Sinoquet Probabilistic Graphical Models for Next-generation Genomics and Genetics , 2014 .

[63]  N. Lazar,et al.  The ASA Statement on p-Values: Context, Process, and Purpose , 2016 .

[64]  Dries Berckmans,et al.  General introduction to precision livestock farming , 2017 .

[65]  K. Weigel,et al.  Heterogeneity in genetic and nongenetic variation and energy sink relationships for residual feed intake across research stations and countries. , 2015, Journal of dairy science.

[66]  Alan F. Karr,et al.  Deming, data and observational studies: A process out of control and needing fixing , 2013 .

[67]  S. Helene Richter,et al.  Environmental standardization: cure or cause of poor reproducibility in animal experiments? , 2009, Nature Methods.

[68]  W. Stroup Rethinking the Analysis of Non-Normal Data in Plant and Soil Science , 2015 .

[69]  I McCance,et al.  Assessment of statistical procedures used in papers in the Australian Veterinary Journal. , 1995, Australian veterinary journal.