Variable selection via the composite likelihood method for multilevel longitudinal data with missing responses and covariates

Abstract Longitudinal data with multilevel structures are commonly collected when following up subjects in clusters over a period of time. Missing values and variable selection issues are common for such data. Biased results may be produced if incompleteness of data is ignored in the analysis. On the other hand, incorporating a large number of irrelevant covariates into inferential procedures may lead to difficulty in computation and interpretation. A unified penalized composite likelihood framework is developed to handle data with missingness and variable selection issues. It is flexible to handle the situation where responses and covariates are missing not simultaneously under the assumption of missing not at random. The method is justified both rigorously with theoretical results and numerically with simulation studies. The method is also applied to the Waterloo Smoking Prevention Project data.

[1]  Harry Joe,et al.  Composite Likelihood Methods , 2012 .

[2]  Hao Helen Zhang,et al.  Variable Selection for Semiparametric Mixed Models in Longitudinal Studies , 2010, Biometrics.

[3]  LIKELIHOOD-BASED INFERENCE WITH NONIGNORABLE MISSING RESPONSES AND COVARIATES IN MODELS FOR DISCRETE LONGITUDINAL DATA , 2006 .

[4]  Jianqing Fan,et al.  New Estimation and Model Selection Procedures for Semiparametric Modeling in Longitudinal Data Analysis , 2004 .

[5]  Harvey Goldstein,et al.  Fitting multilevel multivariate models with missing data in responses and covariates that may include interactions and non‐linear terms , 2014 .

[6]  Geert Molenberghs,et al.  A protective estimator for longitudinal binary data subject to non-ignorable non-monotone missingness , 2005 .

[7]  D. Hedeker,et al.  Random effects probit and logistic regression models for three-level data. , 1997, Biometrics.

[8]  Raymond J. Carroll,et al.  Estimation and comparison of changes in the presence of informative right censoring by modeling the censoring process , 1988 .

[9]  Grace Y Yi,et al.  Estimation methods for marginal and association parameters for longitudinal binary data with nonignorable missing observations , 2013, Statistics in medicine.

[10]  Joseph G Ibrahim,et al.  Maximum Likelihood Methods for Nonignorable Missing Responses and Covariates in Random Effects Models , 2003, Biometrics.

[11]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[12]  Roderick J. A. Little,et al.  Modeling the Drop-Out Mechanism in Repeated-Measures Studies , 1995 .

[13]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[14]  Ji Zhu,et al.  Variable Selection With the Strong Heredity Constraint and Its Oracle Property , 2010 .

[15]  Nan M. Laird,et al.  Multivariate Logistic Models for Incomplete Binary Responses , 1996 .

[16]  M. Kenward,et al.  Informative Drop‐Out in Longitudinal Data Analysis , 1994 .

[17]  J. Ibrahim,et al.  Fixed and Random Effects Selection in Mixed Effects Models , 2011, Biometrics.

[18]  H. Bondell,et al.  Joint Variable Selection for Fixed and Random Effects in Linear Mixed‐Effects Models , 2010, Biometrics.

[19]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[20]  Grace Y. Yi,et al.  Simultaneous model selection and estimation for mean and association structures with clustered binary data , 2012, 1208.4379.

[21]  Bruce G. Lindsay,et al.  ISSUES AND STRATEGIES IN THE SELECTION OF COMPOSITE LIKELIHOODS , 2011 .

[22]  Raymond J Carroll,et al.  Methods to assess an exercise intervention trial based on 3-level functional data. , 2015, Biostatistics.

[23]  Grace Y. Yi,et al.  A pairwise likelihood approach for longitudinal data with missing observations in both response and covariates , 2013, Comput. Stat. Data Anal..

[24]  Grace Y. Yi,et al.  Missing Data Mechanisms for Analysing Longitudinal Data with Incomplete Observations in Both Responses and Covariates , 2016 .

[25]  Jianqing Fan,et al.  Variable Selection for Cox's proportional Hazards Model and Frailty Model , 2002 .

[26]  K. Brown,et al.  Effectiveness of a social influences smoking prevention program as a function of provider type, training method, and school risk. , 1999, American journal of public health.

[27]  Grace Y. Yi,et al.  Simultaneous variable selection and estimation for multivariate multilevel longitudinal data with both continuous and binary responses , 2018, Comput. Stat. Data Anal..

[28]  N. Reid,et al.  AN OVERVIEW OF COMPOSITE LIKELIHOOD METHODS , 2011 .

[29]  P. Song,et al.  Composite Likelihood Bayesian Information Criteria for Model Selection in High-Dimensional Data , 2010 .

[30]  Grace Y. Yi Composite Likelihood/Pseudolikelihood , 2017 .