Doubly Regularized REML for Estimation and Selection of Fixed and Random Effects in Linear Mixed-Effects Models

The linear mixed effects model (LMM) is widely used in the analysis of clustered or longitudinal data. In the practice of LMM, the inference on the structure of the random effects component is of great importance, not only to yield proper interpretation of subject-specific effects but also to draw valid statistical conclusions. This task of inference becomes significantly challenging when a large number of fixed effects and random effects are involved in the analysis. The difficulty of variable selection arises from the need of simultaneously regularizing both mean model and covariance structures, with possible parameter constraints between the two. In this paper, we propose a novel method of doubly regularized restricted maximum likelihood to select fixed and random effects simultaneously in the LMM. The Cholesky decomposition is invoked to ensure the positive-definiteness of the selected covariance matrix of random effects, and selected random effects are invariant with respect to the ordering of predictors appearing in the Cholesky decomposition. We then develop a new algorithm that solves the related optimization problem effectively, in which the computational cost is comparable with that of the Newton-Raphson algorithm for MLE or REML in the LMM. We also investigate large sample properties for the proposed method, including the oracle property. Both simulation studies and data analysis are included for illustration. Doubly Regularized REML for Estimation and Selection of Fixed and Random Effects in Linear Mixed-Effects Models ∗ Sijian Wang, Peter X.-K. Song and Ji Zhu Abstract The linear mixed effects model (LMM) is widely used in the analysis of clustered or longitudinal data. In the practice of LMM, the inference on the structure of the random effects component is of great importance, not only to yield proper interpretation of subject-specific effects but also to draw valid statistical conclusions. This task of inference becomes significantly challenging when a large number of fixed effects and random effects are involved in the analysis. The difficulty of variable selection arises from the need of simultaneously regularizing both mean model and covariance structures, with possible parameter constraints between the two. In this paper, we propose a novel method of doubly regularized restricted maximum likelihood to select fixed and random effects simultaneously in the LMM. The Cholesky decomposition is invoked to ensure the positive-definiteness of the selected covariance matrix of random effects, and selected random effects are invariant with respect to the ordering of predictors appearing in the Cholesky decomposition. We then develop a new algorithm that solves the related optimization problem effectively, in which the computational cost is comparable with that of the Newton-Raphson algorithm for MLE or REML in the LMM. We also investigate large sample properties for the proposed method, including the oracle property. Both simulation studies and data analysis are included for illustration.

[1]  E. L. Lehmann,et al.  Theory of point estimation , 1950 .

[2]  H. Akaike,et al.  Information Theory and an Extension of the Maximum Likelihood Principle , 1973 .

[3]  D. Harville Bayesian inference for variance components using only error contrasts , 1974 .

[4]  D. Harville Maximum Likelihood Approaches to Variance Component Estimation and to Related Problems , 1977 .

[5]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[6]  J. Ware,et al.  Random-effects models for longitudinal data. , 1982, Biometrics.

[7]  R. Jennrich,et al.  Unbalanced repeated-measures models with structured covariance matrices. , 1986, Biometrics.

[8]  D. Bates,et al.  Newton-Raphson and EM Algorithms for Linear Mixed-Effects Models for Repeated-Measures Data , 1988 .

[9]  Nan M. Laird,et al.  The Effect of Covariance Structure on Variance Estimation in Balanced Growth-Curve Models with Random Parameters , 1989 .

[10]  N. Breslow,et al.  Approximate inference in generalized linear mixed models , 1993 .

[11]  D. Stram,et al.  Variance components testing in the longitudinal mixed effects model. , 1994, Biometrics.

[12]  L. Breiman Better subset regression using the nonnegative garrote , 1995 .

[13]  Jiming Jiang REML estimation: asymptotic behavior and related topics , 1996 .

[14]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[15]  G. Verbeke,et al.  The effect of misspecifying the random-effects distribution in linear mixed models for longitudinal data , 1997 .

[16]  Xihong Lin Variance component testing in generalised linear models with random effects , 1997 .

[17]  Daniel Commenges,et al.  Generalized Score Test of Homogeneity Based on Correlated Random Effects Models , 1997 .

[18]  S. Chib,et al.  Bayesian Tests and Model Diagnostics in Conditionally Independent Hierarchical Models , 1997 .

[19]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[20]  Daniel B. Hall,et al.  Order‐restricted score tests for homogeneity in generalised linear and nonlinear mixed models , 2001 .

[21]  D. Dunson,et al.  Random Effects Selection in Linear Mixed Models , 2003, Biometrics.

[22]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[23]  Mary Jane Rotheram-Borus,et al.  Six-year intervention outcomes for adolescent children of parents with the human immunodeficiency virus. , 2004, Archives of pediatrics & adolescent medicine.

[24]  Robert E. Weiss,et al.  Modeling Longitudinal Data , 2005 .

[25]  F. Vaida,et al.  Conditional Akaike information for mixed-effects models , 2005 .

[26]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[27]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[28]  Hao Helen Zhang,et al.  Component selection and smoothing in multivariate nonparametric regression , 2006, math/0702659.

[29]  Lan Lan,et al.  Variable Selection in Linear Mixed Model for Longitudinal Data , 2006 .

[30]  P. X. Song,et al.  Correlated data analysis : modeling, analytics, and applications , 2007 .

[31]  Hao Helen Zhang,et al.  Adaptive Lasso for Cox's proportional hazards model , 2007 .

[32]  Runze Li,et al.  Tuning parameter selectors for the smoothly clipped absolute deviation method. , 2007, Biometrika.

[33]  Jiming Jiang Linear and Generalized Linear Mixed Models and Their Applications , 2007 .

[34]  J. S. Rao,et al.  Fence methods for mixed model selection , 2008, 0808.0985.

[35]  Scott D. Foster,et al.  ESTIMATION, PREDICTION AND INFERENCE FOR THE LASSO RANDOM EFFECTS MODEL , 2009 .