Robustness of linear mixed‐effects models to violations of distributional assumptions

Linear mixed‐effects models are powerful tools for analysing complex datasets with repeated or clustered observations, a common data structure in ecology and evolution. Mixed‐effects models involve complex fitting procedures and make several assumptions, in particular about the distribution of residual and random effects. Violations of these assumptions are common in real datasets, yet it is not always clear how much these violations matter to accurate and unbiased estimation. Here we address the consequences of violations in distributional assumptions and the impact of missing random effect components on model estimates. In particular, we evaluate the effects of skewed, bimodal and heteroscedastic random effect and residual variances, of missing random effect terms and of correlated fixed effect predictors. We focus on bias and prediction error on estimates of fixed and random effects. Model estimates were usually robust to violations of assumptions, with the exception of slight upward biases in estimates of random effect variance if the generating distribution was bimodal but was modelled by Gaussian error distributions. Further, estimates for (random effect) components that violated distributional assumptions became less precise but remained unbiased. However, this particular problem did not affect other parameters of the model. The same pattern was found for strongly correlated fixed effects, which led to imprecise, but unbiased estimates, with uncertainty estimates reflecting imprecision. Unmodelled sources of random effect variance had predictable effects on variance component estimates. The pattern is best viewed as a cascade of hierarchical grouping factors. Variances trickle down the hierarchy such that missing higher‐level random effect variances pool at lower levels and missing lower‐level and crossed random effect variances manifest as residual variance. Overall, our results show remarkable robustness of mixed‐effects models that should allow researchers to use mixed‐effects models even if the distributional assumptions are objectively violated. However, this does not free researchers from careful evaluation of the model. Estimates that are based on data that show clear violations of key assumptions should be treated with caution because individual datasets might give highly imprecise estimates, even if they will be unbiased on average across datasets.

[1]  Michael B. Morrissey,et al.  Multiple Regression Is Not Multiple Regressions: The Meaning of Multiple Regression and the Non-Problem of Collinearity , 2018, Philosophy, Theory, and Practice in Biology.

[2]  Richard Inger,et al.  A brief introduction to mixed effects modelling and multi-model inference in ecology , 2018, PeerJ.

[3]  László Zsolt Garamszegi,et al.  Statistical Quantification of Individual Differences (SQuID): an educational and statistical tool for understanding multilevel phenotypic data in linear mixed models , 2017 .

[4]  Jonathan Wright,et al.  The biology hidden inside residual within‐individual phenotypic variation , 2015, Biological reviews of the Cambridge Philosophical Society.

[5]  L. Grilli,et al.  Specification of random effects in multilevel models: a review , 2014, Quality & Quantity.

[6]  Holger Schielzeth,et al.  Quantifying the predictability of behaviour: statistical approaches for the study of between‐individual variation in the within‐individual variance , 2015 .

[7]  N. Warrington,et al.  Robustness of the linear mixed effects model to error distribution assumptions and the consequences for genome-wide association studies , 2014, Statistical applications in genetics and molecular biology.

[8]  D. Bates,et al.  Fitting Linear Mixed-Effects Models Using lme4 , 2014, 1406.5823.

[9]  Adam Loy,et al.  HLMdiag: A Suite of Diagnostics for Hierarchical Linear Models in R , 2014 .

[10]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[11]  Roser Bono,et al.  The effect of skewness and kurtosis on the robustness of linear mixed models , 2013, Behavior research methods.

[12]  Adam Loy,et al.  Diagnostic tools for hierarchical linear models , 2013 .

[13]  Ian R. Cleasby,et al.  Neglected biological patterns in the residuals , 2011, Behavioral Ecology and Sociobiology.

[14]  Lloyd J Edwards,et al.  Avoiding bias in mixed model inference for fixed effects , 2011, Statistics in medicine.

[15]  C. McCulloch,et al.  Misspecifying the Shape of a Random Effects Distribution: Why Getting It Wrong May Not Matter , 2011, 1201.1980.

[16]  G. Wagner,et al.  Measurement and Meaning in Biology , 2011, The Quarterly Review of Biology.

[17]  K. Zare,et al.  Diagnostic measures for linear mixed measurement error models , 2011 .

[18]  Shinichi Nakagawa,et al.  Repeatability for Gaussian and non‐Gaussian data: a practical guide for biologists , 2010, Biological reviews of the Cambridge Philosophical Society.

[19]  Ariel Alonso,et al.  A Note on the Indeterminacy of the Random-Effects Distribution in Hierarchical Models , 2010 .

[20]  Alain F. Zuur,et al.  A protocol for data exploration to avoid common statistical problems , 2010 .

[21]  Jing Cheng,et al.  Real longitudinal data analysis for real people: Building a good enough mixed model , 2010, Statistics in medicine.

[22]  Robert P. Freckleton,et al.  Dealing with collinearity in behavioural and ecological data: model averaging and the problems of measurement error , 2010, Behavioral Ecology and Sociobiology.

[23]  Michael G. Kenward,et al.  An improved approximation to the precision of fixed effects from restricted maximum likelihood , 2009, Comput. Stat. Data Anal..

[24]  Joseph Hilbe,et al.  Data Analysis Using Regression and Multilevel/Hierarchical Models , 2009 .

[25]  A. Zuur,et al.  Mixed Effects Models and Extensions in Ecology with R , 2009 .

[26]  Mollie E. Brooks,et al.  Generalized linear mixed models: a practical guide for ecology and evolution. , 2009, Trends in ecology & evolution.

[27]  H. Schielzeth,et al.  Conclusions beyond support: overconfident estimates in mixed models , 2008, Behavioral ecology : official journal of the International Society for Behavioral Ecology.

[28]  Shinichi Nakagawa,et al.  Missing inaction: the dangers of ignoring missing data. , 2008, Trends in ecology & evolution.

[29]  Benjamin M. Bolker,et al.  Ecological Models and Data in R , 2008 .

[30]  Cécile Proust-Lima,et al.  Robustness of the linear mixed model to misspecified error distribution , 2007, Comput. Stat. Data Anal..

[31]  Juvêncio Santos Nobre,et al.  Residual Analysis for Linear Mixed Models , 2007, Biometrical journal. Biometrische Zeitschrift.

[32]  Keith E. Muller,et al.  Extending the Box–Cox transformation to the linear mixed model , 2006 .

[33]  Eugene Demidenko,et al.  Influence analysis for linear mixed‐effects models , 2005, Statistics in medicine.

[34]  John L.P. Thompson,et al.  Missing data , 2004, Amyotrophic lateral sclerosis and other motor neuron disorders : official publication of the World Federation of Neurology, Research Group on Motor Neuron Diseases.

[35]  John A. Nelder,et al.  Conditional and Marginal Models: Another View , 2004 .

[36]  Cora J. M. Maas,et al.  Robustness issues in multilevel regression analysis , 2004 .

[37]  R. Royall,et al.  Interpreting statistical evidence by using imperfect models: robust adjusted likelihood functions , 2003 .

[38]  H. Stern,et al.  Posterior predictive model checking in hierarchical models , 2003 .

[39]  G. Quinn,et al.  Experimental Design and Data Analysis for Biologists , 2002 .

[40]  P. Heagerty,et al.  Misspecified maximum likelihood estimates and generalised linear mixed models , 2001 .

[41]  M. Kenward,et al.  Small sample inference for fixed effects from restricted maximum likelihood. , 1997, Biometrics.

[42]  G. Verbeke,et al.  The effect of misspecifying the random-effects distribution in linear mixed models for longitudinal data , 1997 .

[43]  G. Verbeke,et al.  A Linear Mixed-Effects Model with Heterogeneity in the Random-Effects Population , 1996 .

[44]  E D Brodie,et al.  Visualizing and quantifying natural selection. , 1995, Trends in ecology & evolution.

[45]  Jeremy M. G. Taylor,et al.  A Stochastic Model for Analysis of Longitudinal AIDS Data , 1994 .

[46]  S. Moolgavkar,et al.  A Method for Computing Profile-Likelihood- Based Confidence Intervals , 1988 .

[47]  R. Royall Model robust confidence intervals using maximum likelihood estimators , 1986 .

[48]  D. Rubin Bayesianly Justifiable and Relevant Frequency Calculations for the Applied Statistician , 1984 .

[49]  George E. P. Box,et al.  Sampling and Bayes' inference in scientific modelling and robustness , 1980 .