Simultaneous model selection and estimation for mean and association structures with clustered binary data

This paper investigates the property of the penalized estimating equations when both the mean and association structures are modelled. To select variables for the mean and association structures sequentially, we propose a hierarchical penalized generalized estimating equations (HPGEE2) approach. The first set of penalized estimating equations is solved for the selection of significant mean parameters. Conditional on the selected mean model, the second set of penalized estimating equations is solved for the selection of significant association parameters. The hierarchical approach is designed to accommodate possible model constraints relating the inclusion of covariates into the mean and the association models. This two-step penalization strategy enjoys a compelling advantage of easing computational burdens compared to solving the two sets of penalized equations simultaneously. HPGEE2 with a smoothly clipped absolute deviation (SCAD) penalty is shown to have the oracle property for the mean and association models. The asymptotic behavior of the penalized estimator under this hierarchical approach is established. An efficient two-stage penalized weighted least square algorithm is developed to implement the proposed method. The empirical performance of the proposed HPGEE2 is demonstrated through Monte-Carlo studies and the analysis of a clinical data set.

[1]  Jianqing Fan,et al.  New Estimation and Model Selection Procedures for Semiparametric Modeling in Longitudinal Data Analysis , 2004 .

[2]  A. Agresti,et al.  Simultaneously Modeling Joint and Marginal Distributions of Multivariate Categorical Responses , 1994 .

[3]  R. Prentice,et al.  Correlated binary regression with covariates specific to each binary observation. , 1988, Biometrics.

[4]  N. Meinshausen,et al.  High-dimensional graphs and variable selection with the Lasso , 2006, math/0608017.

[5]  Andrea Rotnitzky,et al.  Regression Models for Discrete Longitudinal Responses , 1993 .

[6]  Donald Hedeker,et al.  Longitudinal Data Analysis , 2006 .

[7]  S. Lipsitz,et al.  Generalized estimating equations for correlated binary data: Using the odds ratio as a measure of association , 1991 .

[8]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[9]  Richard J. Cook,et al.  Marginal Methods for Incomplete Longitudinal Data Arising in Clusters , 2002 .

[10]  S. Zeger,et al.  Multivariate Regression Analyses for Categorical Data , 1992 .

[11]  R. Tibshirani,et al.  Sparse inverse covariance estimation with the graphical lasso. , 2008, Biostatistics.

[12]  Jianqing Fan,et al.  Variable Selection for Cox's proportional Hazards Model and Frailty Model , 2002 .

[13]  D. Hedeker,et al.  Effects of social support and relapse prevention training as adjuncts to a televised smoking-cessation intervention. , 1993, Journal of consulting and clinical psychology.

[14]  P. Diggle,et al.  Modelling multivariate binary data with alternating logistic regressions , 1993 .

[15]  Runze Li,et al.  Quadratic Inference Functions for Varying‐Coefficient Models with Longitudinal Data , 2006, Biometrics.

[16]  S. Zeger,et al.  Longitudinal data analysis using generalized linear models , 1986 .

[17]  J. Ibrahim,et al.  Fixed and Random Effects Selection in Mixed Effects Models , 2011, Biometrics.

[18]  G. Molenberghs,et al.  Marginal Modeling of Correlated Ordinal Data Using a Multivariate Plackett Distribution , 1994 .

[19]  Hongtu Zhu,et al.  VARIABLE SELECTION FOR REGRESSION MODELS WITH MISSING DATA. , 2010, Statistica Sinica.

[20]  G. Molenberghs,et al.  Marginal modelling of Correlated Ordinal Data using an n-way Plackett Distribution , 1992 .

[21]  Runze Li,et al.  Variable selection for multivariate failure time data. , 2005, Biometrika.

[22]  Lan Wang,et al.  GEE analysis of clustered binary data with diverging number of covariates , 2011, 1103.1795.

[23]  Hua Liang,et al.  Semiparametric marginal and association regression methods for clustered binary data , 2011, Annals of the Institute of Statistical Mathematics.

[24]  G. Yi,et al.  A Pairwise Likelihood Method For Correlated Binary Data With/withoutMissing Observations Under Generalized Partially Linear Single-indexModels , 2011 .

[25]  H. Zou,et al.  One-step Sparse Estimates in Nonconcave Penalized Likelihood Models. , 2008, Annals of statistics.

[26]  M. Yuan,et al.  Model selection and estimation in the Gaussian graphical model , 2007 .

[27]  H. Bondell,et al.  Joint Variable Selection for Fixed and Random Effects in Linear Mixed‐Effects Models , 2010, Biometrics.

[28]  Hua Liang,et al.  Analysis of correlated binary data under partially linear single-index logistic models , 2009, J. Multivar. Anal..

[29]  Stuart R. Lipsitz,et al.  A Model for Binary Time Series Data with Serial Odds Ratio Patterns , 1995 .

[30]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[31]  Cun-Hui Zhang,et al.  Adaptive Lasso for sparse high-dimensional regression models , 2008 .