A general and simple method for obtaining R2 from generalized linear mixed‐effects models

The use of both linear and generalized linear mixed‐effects models (LMMs and GLMMs) has become popular not only in social and medical sciences, but also in biological sciences, especially in the field of ecology and evolution. Information criteria, such as Akaike Information Criterion (AIC), are usually presented as model comparison tools for mixed‐effects models. The presentation of ‘variance explained’ (R2) as a relevant summarizing statistic of mixed‐effects models, however, is rare, even though R2 is routinely reported for linear models (LMs) and also generalized linear models (GLMs). R2 has the extremely useful property of providing an absolute value for the goodness‐of‐fit of a model, which cannot be given by the information criteria. As a summary statistic that describes the amount of variance explained, R2 can also be a quantity of biological interest. One reason for the under‐appreciation of R2 for mixed‐effects models lies in the fact that R2 can be defined in a number of ways. Furthermore, most definitions of R2 for mixed‐effects have theoretical problems (e.g. decreased or negative R2 values in larger models) and/or their use is hindered by practical difficulties (e.g. implementation). Here, we make a case for the importance of reporting R2 for mixed‐effects models. We first provide the common definitions of R2 for LMs and GLMs and discuss the key problems associated with calculating R2 for mixed‐effects models. We then recommend a general and simple method for calculating two types of R2 (marginal and conditional R2) for both LMMs and GLMMs, which are less susceptible to common problems. This method is illustrated by examples and can be widely employed by researchers in any fields of research, regardless of software packages used for fitting mixed‐effects models. The proposed method has the potential to facilitate the presentation of R2 for a wide range of circumstances.

[1]  H. Akaike,et al.  Information Theory and an Extension of the Maximum Likelihood Principle , 1973 .

[2]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[3]  G. Maddala Limited-dependent and qualitative variables in econometrics: Introduction , 1983 .

[4]  P. Schmidt,et al.  Limited-Dependent and Qualitative Variables in Econometrics. , 1984 .

[5]  T. O. Kvålseth Cautionary Note about R 2 , 1985 .

[6]  Anthony S. Bryk,et al.  A Hierarchical Model for Studying School Effects , 1986 .

[7]  N. Nagelkerke,et al.  A note on a general definition of the coefficient of determination , 1991 .

[8]  Roel Bosker,et al.  Modeled Variance in Two-Level Models , 1994 .

[9]  F. Windmeijer,et al.  R-Squared Measures for Count Data Regression Models With Applications to Health-Care Utilization , 1996 .

[10]  E. Vonesh,et al.  Goodness-of-fit in generalized nonlinear mixed-effects models. , 1996, Biometrics.

[11]  F. Windmeijer,et al.  An R-squared measure of goodness of fit for some common nonlinear regression models , 1997 .

[12]  N. Draper,et al.  Applied Regression Analysis: Draper/Applied Regression Analysis , 1998 .

[13]  Roel Bosker,et al.  Multilevel analysis : an introduction to basic and advanced multilevel modeling , 1999 .

[14]  Richard F. Gunst,et al.  Applied Regression Analysis , 1999, Technometrics.

[15]  S. Menard Coefficients of Determination for Multiple Logistic Regression Analysis , 2000 .

[16]  V. Carey,et al.  Mixed-Effects Models in S and S-Plus , 2001 .

[17]  Nils Lid Hjort,et al.  Model Selection and Model Averaging , 2001 .

[18]  Harvey Goldstein,et al.  Partitioning variation in multilevel models , 2002 .

[19]  Bradley P. Carlin,et al.  Bayesian measures of model complexity and fit , 2002 .

[20]  David R. Anderson,et al.  Model selection and multimodel inference : a practical information-theoretic approach , 2003 .

[21]  R. Xu Measuring explained variation in linear mixed effects models , 2003, Statistics in medicine.

[22]  Basile Chaix,et al.  A brief conceptual tutorial on multilevel analysis in social epidemiology: interpreting neighbourhood differences and the effect of neighbourhood characteristics on individual health , 2005, Journal of Epidemiology and Community Health.

[23]  J. Gill Hierarchical Linear Models , 2005 .

[24]  Basile Chaix,et al.  A brief conceptual tutorial on multilevel analysis in social epidemiology: investigating contextual phenomena in different groups of people , 2005, Journal of Epidemiology and Community Health.

[25]  Andrew Gelman,et al.  Bayesian Measures of Explained Variance and Pooling in Multilevel (Hierarchical) Models , 2006, Technometrics.

[26]  Andrew Gelman,et al.  Data Analysis Using Regression and Multilevel/Hierarchical Models , 2006 .

[27]  I. Cuthill,et al.  Effect size, confidence interval and statistical significance: a practical guide for biologists , 2007, Biological reviews of the Cambridge Philosophical Society.

[28]  Lloyd J. Edwards,et al.  Fixed-effect variable selection in linear mixed models using R2 statistics , 2008, Comput. Stat. Data Anal..

[29]  O. Hössjer On the coefficient of determination for mixed regression models , 2008 .

[30]  N. Hjort,et al.  Comprar Model Selection and Model Averaging | Gerda Claeskens | 9780521852258 | Cambridge University Press , 2008 .

[31]  Honghu Liu,et al.  Goodness-of-fit measures of R 2 for repeated measures mixed effect models , 2008 .

[32]  K. Muller,et al.  An R2 statistic for fixed effects in the linear mixed model , 2008, Statistics in medicine.

[33]  H. Schielzeth,et al.  Conclusions beyond support: overconfident estimates in mixed models , 2008, Behavioral ecology : official journal of the International Society for Behavioral Ecology.

[34]  Tue Tjur,et al.  Coefficients of Determination in Logistic Regression Models—A New Proposal: The Coefficient of Discrimination , 2009 .

[35]  Mollie E. Brooks,et al.  Generalized linear mixed models: a practical guide for ecology and evolution. , 2009, Trends in ecology & evolution.

[36]  H. Schielzeth Simple means to improve the interpretability of regression coefficients , 2010 .

[37]  Herbert Hoijtink,et al.  Model Selection Based on Information Criteria in Multilevel Modeling , 2010 .

[38]  Jarrod D. Hadfield,et al.  MCMC methods for multi-response generalized linear mixed models , 2010 .

[39]  Shinichi Nakagawa,et al.  Repeatability for Gaussian and non‐Gaussian data: a practical guide for biologists , 2010, Biological reviews of the Cambridge Philosophical Society.

[40]  Peter Congdon Applied Bayesian Hierarchical Methods , 2010 .

[41]  J. K. Roberts,et al.  Explained Variance in Multilevel Models , 2010 .

[42]  I. Jamieson,et al.  Multimodel inference in ecology and evolution: challenges and solutions , 2011, Journal of evolutionary biology.

[43]  Shinichi Nakagawa,et al.  The Risk and Intensity of Sperm Ejection in Female Birds , 2011, The American Naturalist.

[44]  A. Karagrigoriou Claeskens, G. & Hjort, N. L. (2009). Model Selection and Model Averaging. , 2011 .

[45]  Shinichi Nakagawa,et al.  Nested by design: model fitting and interpretation in a mixed model era , 2013 .

[46]  Robert E. Ployhart,et al.  Hierarchical Linear Models , 2014 .