Clustering in linear‐mixed models with a group fused lasso penalty

A method is proposed that aims at identifying clusters of individuals that show similar patterns when observed repeatedly. We consider linear-mixed models that are widely used for the modeling of longitudinal data. In contrast to the classical assumption of a normal distribution for the random effects a finite mixture of normal distributions is assumed. Typically, the number of mixture components is unknown and has to be chosen, ideally by data driven tools. For this purpose, an EM algorithm-based approach is considered that uses a penalized normal mixture as random effects distribution. The penalty term shrinks the pairwise distances of cluster centers based on the group lasso and the fused lasso method. The effect is that individuals with similar time trends are merged into the same cluster. The strength of regularization is determined by one penalization parameter. For finding the optimal penalization parameter a new model choice criterion is proposed.

[1]  Arnovst Kom'arek,et al.  Clustering for multivariate continuous and discrete longitudinal data , 2013, 1304.4448.

[2]  Friedrich Leisch,et al.  Mixtures of regression models for time course gene expression data: evaluation of initialization and random effects , 2010, Bioinform..

[3]  Geert Verbeke,et al.  The Effect of Drop‐Out on the Efficiency of Longitudinal Experiments , 1999 .

[4]  S. Greven,et al.  On the behaviour of marginal and conditional AIC in linear mixed models , 2010 .

[5]  John Hinde,et al.  Multivariate generalized linear mixed models with semi-nonparametric and smooth nonparametric random effects densities , 2012, Stat. Comput..

[6]  D. Bates,et al.  Newton-Raphson and EM Algorithms for Linear Mixed-Effects Models for Repeated-Measures Data , 1988 .

[7]  G. Verbeke,et al.  A Linear Mixed-Effects Model with Heterogeneity in the Random-Effects Population , 1996 .

[8]  Emmanuel Lesaffre,et al.  Generalized linear mixed model with a penalized Gaussian mixture as a random effects distribution , 2008, Comput. Stat. Data Anal..

[9]  Geoffrey J. McLachlan,et al.  Standard errors of fitted component means of normal mixtures , 1997 .

[10]  B. Ripley,et al.  Semiparametric Regression: Preface , 2003 .

[11]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[12]  C S Berkey,et al.  Distribution of forced vital capacity and forced expiratory volume in one second in children 6 to 11 years of age. , 1983, The American review of respiratory disease.

[13]  Gerhard Tutz,et al.  Clustering in linear mixed models with Dirichlet process mixtures using EM algorithm , 2011 .

[14]  Anna E Barón,et al.  Cluster analysis using multivariate mixed effects models , 2009, Statistics in medicine.

[15]  L. de Ridder,et al.  Comparative effects of neonatal and prepubertal castration on craniofacial growth in rats. , 1998, Archives of oral biology.

[16]  D. Lindley,et al.  Bayes Estimates for the Linear Model , 1972 .

[17]  Geoffrey J. McLachlan,et al.  Finite Mixture Models , 2019, Annual Review of Statistics and Its Application.

[18]  Gerhard Tutz,et al.  Clustering in linear mixed models with approximate Dirichlet process mixtures using EM algorithm , 2013 .

[19]  J. Ware,et al.  Random-effects models for longitudinal data. , 1982, Biometrics.

[20]  D J Spiegelhalter,et al.  Approximate cross‐validatory predictive checks in disease mapping models , 2003, Statistics in medicine.

[21]  R. Tibshirani,et al.  Sparsity and smoothness via the fused lasso , 2005 .

[22]  G. Casella,et al.  Clustering using objective functions and stochastic search , 2008 .

[23]  Fernando A. Quintana,et al.  Model-based clustering for longitudinal data , 2008, Comput. Stat. Data Anal..

[24]  Catherine A. Sugar,et al.  Clustering for Sparsely Sampled Functional Data , 2003 .

[25]  Leonhard Held,et al.  Predictive Cross‐validation for the Choice of Linear Mixed‐Effects Models with Application to Data from the Swiss HIV Cohort Study , 2012, Biometrics.

[26]  G. Celeux,et al.  Mixture of linear mixed models for clustering gene expression profiles from repeated microarray experiments , 2005 .

[27]  P. Deb Finite Mixture Models , 2008 .

[28]  Xueli Liu,et al.  Simultaneous curve registration and clustering for functional data , 2009, Comput. Stat. Data Anal..

[29]  Kui Wang,et al.  A Mixture model with random-effects components for clustering correlated gene-expression profiles , 2006, Bioinform..

[30]  Padhraic Smyth,et al.  Curve Clustering with Random Effects Regression Mixtures , 2003, AISTATS.

[31]  John A. Nelder,et al.  A Simplex Method for Function Minimization , 1965, Comput. J..

[32]  Ana Ivelisse Avilés,et al.  Linear Mixed Models for Longitudinal Data , 2001, Technometrics.

[33]  B. Efron Bootstrap Methods: Another Look at the Jackknife , 1979 .

[34]  G. McLachlan,et al.  The EM algorithm and extensions , 1996 .

[35]  R. O'Neill Algorithm AS 47: Function Minimization Using a Simplex Procedure , 1971 .

[36]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[37]  E. Lesaffre,et al.  Discriminant analysis using a multivariate linear mixed model with a normal mixture in the random effects distribution , 2010, Statistics in medicine.

[38]  F. Vaida,et al.  Conditional Akaike information for mixed-effects models , 2005 .

[39]  L. Fahrmeir,et al.  Regression - Modelle, Methoden und Anwendungen , 2009 .

[40]  D. Bates,et al.  Linear Mixed-Effects Models using 'Eigen' and S4 , 2015 .