A multi-lab experimental assessment reveals that replicability can be improved by using empirical estimates of genotype-by-lab interaction

The utility of mouse and rat studies critically depends on their replicability in other laboratories. A widely advocated approach to improving replicability is through the rigorous control of predefined animal or experimental conditions, known as standardization. However, this approach limits the generalizability of the findings to only to the standardized conditions and is a potential cause rather than solution to what has been called a replicability crisis. Alternative strategies include estimating the heterogeneity of effects across laboratories, either through designs that vary testing conditions, or by direct statistical analysis of laboratory variation. We previously evaluated our statistical approach for estimating the interlaboratory replicability of a single laboratory discovery. Those results, however, were from a well-coordinated, multi-lab phenotyping study and did not extend to the more realistic setting in which laboratories are operating independently of each other. Here, we sought to test our statistical approach as a realistic prospective experiment, in mice, using 152 results from 5 independent published studies deposited in the Mouse Phenome Database (MPD). In independent replication experiments at 3 laboratories, we found that 53 of the results were replicable, so the other 99 were considered non-replicable. Of the 99 non-replicable results, 59 were statistically significant (at 0.05) in their original single-lab analysis, putting the probability that a single-lab statistical discovery was made even though it is non-replicable, at 59.6%. We then introduced the dimensionless “Genotype-by-Laboratory” (GxL) factor—the ratio between the standard deviations of the GxL interaction and the standard deviation within groups. Using the GxL factor reduced the number of single-lab statistical discoveries and alongside reduced the probability of a non-replicable result to be discovered in the single lab to 12.1%. Such reduction naturally leads to reduced power to make replicable discoveries, but this reduction was small (from 87% to 66%), indicating the small price paid for the large improvement in replicability. Tools and data needed for the above GxL adjustment are publicly available at the MPD and will become increasingly useful as the range of assays and testing conditions in this resource increases.

[1]  Vanessa Tabea von Kortzfleisch,et al.  Do multiple experimenters improve the reproducibility of animal studies? , 2022, PLoS biology.

[2]  M. Krzywinski,et al.  The standardization fallacy , 2021, Nature Methods.

[3]  I. Gozes The ADNP Syndrome and CP201 (NAP) Potential and Hope , 2020, Frontiers in Neurology.

[4]  Stephen C. Grubb,et al.  Mouse Phenome Database: a data repository and analysis suite for curated primary mouse phenotype data , 2019, Nucleic Acids Res..

[5]  F. Fernández‐Avilés,et al.  CIBER-CLAP (CIBERCV Cardioprotection Large Animal Platform): A multicenter preclinical network for testing reproducibility in cardiovascular interventions , 2019, Scientific Reports.

[6]  R. Irizarry ggplot2 , 2019, Introduction to Data Science.

[7]  Monya Baker,et al.  Reporting animal research: Explanation and Elaboration for the ARRIVE guidelines 2019 , 2019, bioRxiv.

[8]  J. J. Higgins,et al.  From One Environment to Many: The Problem of Replicability of Statistical Inferences , 2019, 1904.10036.

[9]  Damian Smedley,et al.  The International Mouse Phenotyping Consortium (IMPC): a functional catalogue of the mammalian genome that informs conservation , 2018, Conservation Genetics.

[10]  Natasha A Karp,et al.  Reproducible preclinical research—Is embracing variability the answer? , 2018, PLoS biology.

[11]  Bernhard Voelkl,et al.  Reproducibility of preclinical animal research improves with heterogeneity of study samples , 2018, PLoS biology.

[12]  Robert W. Williams,et al.  Reproducibility and replicability of rodent phenotyping in preclinical studies , 2016, Neuroscience & Biobehavioral Reviews.

[13]  Yoav Benjamini,et al.  Addressing reproducibility in single-laboratory phenotyping experiments , 2017, Nature Methods.

[14]  J. Ioannidis,et al.  Empirical assessment of published effect sizes and power in the recent cognitive neuroscience and psychology literature , 2017, PLoS biology.

[15]  Christopher D. Chambers,et al.  Redefine statistical significance , 2017, Nature Human Behaviour.

[16]  William Valdar,et al.  Ovariectomy results in inbred strain-specific increases in anxiety-like behavior in mice , 2016, Physiology & Behavior.

[17]  Henrik Westerberg,et al.  Analysis of mammalian gene function through broad based phenotypic screens across a consortium of mouse clinics , 2015, Nature Genetics.

[18]  I. Cockburn,et al.  The Economics of Reproducibility in Preclinical Research , 2015, PLoS biology.

[19]  Jonathan W Schooler,et al.  Turning the Lens of Science on Itself , 2014, Perspectives on psychological science : a journal of the Association for Psychological Science.

[20]  D. Bates,et al.  Fitting Linear Mixed-Effects Models Using lme4 , 2014, 1406.5823.

[21]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[22]  F. Collins,et al.  NIH plans to enhance reproducibility , 2014 .

[23]  Emily S. Sena,et al.  Bringing rigour to translational medicine , 2014, Nature Reviews Neurology.

[24]  Steven E. Hyman,et al.  Revolution Stalled , 2012, Science Translational Medicine.

[25]  Steve D. M. Brown,et al.  Mouse large-scale phenotyping initiatives: overview of the European Mouse Disease Clinic (EUMODIC) and of the Wellcome Trust Sanger Institute Mouse Genetics Project , 2012, Mammalian Genome.

[26]  Patrick F. Sullivan,et al.  ANTIPSYCHOTIC-INDUCED VACUOUS CHEWING MOVEMENTS AND EXTRAPYRAMIDAL SIDE-EFFECTS ARE HIGHLY HERITABLE IN MICE , 2010, The Pharmacogenomics Journal.

[27]  B. H. Miller,et al.  Evaluating genetic markers and neurobiochemical analytes for fluoxetine response using a panel of mouse inbred strains , 2011, Psychopharmacology.

[28]  Joachim Kunert,et al.  Systematic variation improves reproducibility of animal experiments , 2010, Nature Methods.

[29]  Hadley Wickham,et al.  ggplot2 - Elegant Graphics for Data Analysis (2nd Edition) , 2017 .

[30]  P. Glasziou,et al.  Avoidable waste in the production and reporting of research evidence , 2009, The Lancet.

[31]  T. Hothorn,et al.  Simultaneous Inference in General Parametric Models , 2008, Biometrical journal. Biometrische Zeitschrift.

[32]  D. Reed,et al.  Forty mouse strain survey of water and sodium intake , 2007, Physiology & Behavior.

[33]  Anat Sakov,et al.  Genotype-environment interactions in mouse behavior: a way out of the problem. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[34]  Anat Sakov,et al.  New replicable anxiety-related measures of wall vs center behavior of mice in the open field. , 2004, Journal of applied physiology.

[35]  Anat Sakov,et al.  The dynamics of spatial behavior: how can robust smoothing techniques help? , 2004, Journal of Neuroscience Methods.

[36]  J. Crabbe,et al.  Strain differences in three measures of ethanol intoxication in mice: the screen, dowel and grip strength tests , 2003, Genes, brain, and behavior.

[37]  Yoav Benjamini,et al.  SEE locomotor behavior test discriminates C57BL/6J and DBA/2J mouse inbred strains across laboratories and protocol conditions. , 2003, Behavioral neuroscience.

[38]  Ilan Golani,et al.  SEE: a tool for the visualization and analysis of rodent exploratory behavior , 2001, Neuroscience & Biobehavioral Reviews.

[39]  Hanno Würbel,et al.  Behaviour and the standardization fallacy , 2000, Nature Genetics.

[40]  J. Crabbe,et al.  Genetics of mouse behavior: interactions with laboratory environment. , 1999, Science.

[41]  Douglas M. Bates,et al.  LINEAR AND NONLINEAR MIXED-EFFECTS MODELS , 1998 .