Valid statistical approaches for clustered data: A Monte Carlo simulation study

The translation of preclinical studies to human applications is associated with a high failure rate, which may be exacerbated by limited training in experimental design and statistical analysis. Nested experimental designs, which occur when data have a multilevel structure (e.g., in vitro: cells within a culture dish; in vivo: rats within a litter), often violate the independent observation assumption underlying many traditional statistical techniques. Although previous studies have empirically evaluated the analytic challenges associated with multilevel data, existing work has not focused on key parameters and design components typically observed in preclinical research. To address this knowledge gap, a Monte Carlo simulation study was conducted to systematically assess the effects of inappropriately modeling multilevel data via a fixed effects ANOVA in studies with sparse observations, no between group comparison within a single cluster, and interactive effects. Simulation results revealed a dramatic increase in the probability of type 1 error and relative bias of the standard error as the number of level-1 (e.g., cells; rats) units per cell increased in the fixed effects ANOVA; these effects were largely attenuated when the nesting was appropriately accounted for via a random effects ANOVA. Thus, failure to account for a nested experimental design may lead to reproducibility challenges and inaccurate conclusions. Appropriately accounting for multilevel data, however, may enhance statistical reliability, thereby leading to improvements in translatability. Valid analytic strategies are provided for a variety of design scenarios.

[1]  Jacob Cohen Statistical Power Analysis for the Behavioral Sciences , 1969, The SAGE Encyclopedia of Research Design.

[2]  D. Bates,et al.  Linear and Nonlinear Mixed Effects Models [R package nlme version 3.1-149] , 2020 .

[3]  Eva Forssell-Aronsson,et al.  Optimization of cell viability assays to improve replicability and reproducibility of cancer drug sensitivity screens , 2020, Scientific Reports.

[4]  W Holmes Finch,et al.  Estimation of Random Coefficient Multilevel Models in the Context of Small Numbers of Level 2 Clusters , 2019, Educational and psychological measurement.

[5]  Timo B. Roettger Researcher degrees of freedom in phonetic research , 2019, Laboratory Phonology: Journal of the Association for Laboratory Phonology.

[6]  Brian A. Nosek,et al.  Evaluating the replicability of social science experiments in Nature and Science between 2010 and 2015 , 2018, Nature Human Behaviour.

[7]  Bernhard Voelkl,et al.  Reproducibility of preclinical animal research improves with heterogeneity of study samples , 2018, PLoS biology.

[8]  Donald R. Williams,et al.  Between-litter variation in developmental studies of hormones and behavior: Inflated false positives and diminished power , 2017, Frontiers in Neuroendocrinology.

[9]  Stanley E Lazic,et al.  What exactly is ‘N’ in cell culture and animal experiments? , 2017, bioRxiv.

[10]  Camilla L. Nord,et al.  Power-up: A Reanalysis of 'Power Failure' in Neuroscience Using Mixture Modeling , 2017, The Journal of Neuroscience.

[11]  Machelle D. Wilson,et al.  Valid statistical approaches for analyzing sholl data: Mixed effects versus simple linear models , 2017, Journal of Neuroscience Methods.

[12]  Andrew Gelman,et al.  Measurement error and the replication crisis , 2017, Science.

[13]  Laura M. Stapleton,et al.  Modeling Clustered Data with Very Few Clusters , 2016, Multivariate behavioral research.

[14]  V. Garovic,et al.  Reinventing Biostatistics Education for Basic Scientists , 2016, PLoS biology.

[15]  U. Dirnagl,et al.  Where Have All the Rodents Gone? The Effects of Attrition in Experimental Research on Cancer and Stroke , 2016, PLoS biology.

[16]  Matthijs Verhage,et al.  Multilevel analysis quantifies variation in the experimental effect while optimizing power and preventing false positives , 2015, BMC Neuroscience.

[17]  O. Steward,et al.  Rigor or Mortis: Best Practices for Preclinical Research in Neuroscience , 2014, Neuron.

[18]  Matthijs Verhage,et al.  A solution to dependency: using multilevel analysis to accommodate nested data , 2014, Nature Neuroscience.

[19]  Katie Lidster,et al.  Two Years Later: Journals Are Not Yet Enforcing the ARRIVE Guidelines on Reporting Standards for Pre-Clinical Animal Studies , 2014, PLoS biology.

[20]  Daniel M McNeish,et al.  Modeling sparsely clustered data: design-based, model-based, and single-level methods. , 2014, Psychological methods.

[21]  Jaykaran Charan,et al.  How to calculate sample size in animal studies? , 2013, Journal of pharmacology & pharmacotherapeutics.

[22]  K. Slocombe,et al.  Pseudoreplication: a widespread problem in primate communication research , 2013, Animal Behaviour.

[23]  J. Ioannidis,et al.  Evaluation of Excess Significance Bias in Animal Studies of Neurological Diseases , 2013, PLoS biology.

[24]  Brian A. Nosek,et al.  Power failure: why small sample size undermines the reliability of neuroscience , 2013, Nature Reviews Neuroscience.

[25]  David L. Vaux,et al.  Research methods: Know when your numbers are significant , 2012, Nature.

[26]  Stanley E Lazic,et al.  Improving basic and translational science by accounting for litter-to-litter variation in animal models , 2013, BMC Neuroscience.

[27]  C. Begley,et al.  Drug development: Raise standards for preclinical cancer research , 2012, Nature.

[28]  F. Prinz,et al.  Believe it or not: how much can we rely on published data on potential drug targets? , 2011, Nature Reviews Drug Discovery.

[29]  Armelle Nugier,et al.  Data with Hierarchical Structure: Impact of Intraclass Correlation and Sample Size on Type-I Error , 2011, Front. Psychology.

[30]  William N. Venables,et al.  Modern Applied Statistics with S , 2010 .

[31]  Hester F. Lingsma,et al.  Covariate adjustment increases statistical power in randomized controlled trials. , 2010, Journal of clinical epidemiology.

[32]  Sally Galbraith,et al.  A Study of Clustered Data and Approaches to Its Analysis , 2010, The Journal of Neuroscience.

[33]  D. Howells,et al.  Publication Bias in Reports of Animal Stroke Studies Leads to Major Overstatement of Efficacy , 2010, PLoS biology.

[34]  Charles A. Scherbaum,et al.  Estimating Statistical Power and Required Sample Sizes for Organizational Research Using Multilevel Modeling , 2009 .

[35]  Kevin Arceneaux,et al.  Modeling Certainty with Clustered Data: A Comparison of Methods , 2009, Political Analysis.

[36]  J. Ioannidis,et al.  Why Current Publication Practices May Distort Science , 2008, PLoS medicine.

[37]  P. Clarke,et al.  When can group level clustering be ignored? Multilevel models versus single-level models with sparse data , 2008, Journal of Epidemiology & Community Health.

[38]  Stephen G West,et al.  Doctoral training in statistics, measurement, and methodology in psychology: replication and extension of Aiken, West, Sechrest, and Reno's (1990) survey of PhD programs in North America. , 2008, The American psychologist.

[39]  Karl P. Pfeiffer,et al.  The Use of Statistics in Medical Research , 2007 .

[40]  Gregory R. Hancock,et al.  Structural equation modeling : a second course , 2006 .

[41]  R. Fisher Statistical methods for research workers , 1927, Protoplasma.

[42]  Ewout W Steyerberg,et al.  Covariate adjustment in randomized controlled trials with dichotomous outcomes increases statistical power and reduces sample size requirements. , 2004, Journal of clinical epidemiology.

[43]  Walter Krämer,et al.  Review of Modern applied statistics with S, 4th ed. by W.N. Venables and B.D. Ripley. Springer-Verlag 2002 , 2003 .

[44]  B. Muthén,et al.  How to Use a Monte Carlo Study to Decide on Sample Size and Determine Power , 2002 .

[45]  W. Pan,et al.  Small‐sample adjustments in using the sandwich variance estimator in generalized estimating equations , 2002, Statistics in medicine.

[46]  M. Fay,et al.  Small‐Sample Adjustments for Wald‐Type Tests Using Sandwich Estimators , 2001, Biometrics.

[47]  J. Crabbe,et al.  Genetics of mouse behavior: interactions with laboratory environment. , 1999, Science.

[48]  S. Raudenbush Statistical analysis and optimal design for cluster randomized trials , 1997 .

[49]  Z. Feng,et al.  A comparison of statistical methods for clustered data analysis with Gaussian error. , 1996, Statistics in medicine.

[50]  J. Kromrey,et al.  Detecting Unit of Analysis Problems in Nested Designs: Statistical Power and Type I Error Rates of the F Test for Groups-within-Treatments Effects , 1996 .

[51]  V. M. Chinchilli,et al.  Small sample characteristics of generalized estimating equations , 1995 .

[52]  B. Pearce,et al.  Principles and pitfalls in the analysis of prenatal treatment effects in multiparous species. , 1992, Neurotoxicology and teratology.

[53]  Anthony S. Bryk,et al.  Hierarchical Linear Models: Applications and Data Analysis Methods , 1992 .

[54]  N. Jewell,et al.  Covariate adjustment. , 1991, Biometrics.

[55]  Bengt Muthén,et al.  On structural equation modeling with data that are not missing completely at random , 1987 .

[56]  C. Álvarez-Dardet,et al.  [The use of statistics in medical publications: an international comparison]. , 1986, Medicina clinica.

[57]  S. Zeger,et al.  Longitudinal data analysis using generalized linear models , 1986 .

[58]  K Y Liang,et al.  Longitudinal data analysis for discrete and continuous outcomes. , 1986, Biometrics.

[59]  Victor Denenberg,et al.  Statistics and experimental design for behavioral and biological researchers , 1976 .

[60]  M. D. Hogan,et al.  Selection of the experimental unit in teratology studies. , 1975, Teratology.

[61]  J. Haseman,et al.  Selection of appropriate experimental units in teratology , 1974 .

[62]  Jacob Cohen,et al.  The Equivalence of Weighted Kappa and the Intraclass Correlation Coefficient as Measures of Reliability , 1973 .

[63]  C. S. Weil Selection of the valid number of sampling units and a consideration of their combination in toxicological studies involving reproduction, teratogenesis or carcinogenesis. , 1970, Food and cosmetics toxicology.

[64]  Seymour Geisser,et al.  Statistical Principles in Experimental Design , 1963 .

[65]  D. Sholl,et al.  Pattern Discrimination and the Visual Cortex , 1953, Nature.

[66]  D. Sholl Dendritic organization in the neurons of the visual and motor cortices of the cat. , 1953, Journal of anatomy.