Variable Selection for Marginal Longitudinal Generalized Linear Models

Variable selection is an essential part of any statistical analysis and yet has been somewhat neglected in the context of longitudinal data analysis. In this article, we propose a generalized version of Mallows's C(p) (GC(p)) suitable for use with both parametric and nonparametric models. GC(p) provides an estimate of a measure of model's adequacy for prediction. We examine its performance with popular marginal longitudinal models (fitted using GEE) and contrast results with what is typically done in practice: variable selection based on Wald-type or score-type tests. An application to real data further demonstrates the merits of our approach while at the same time emphasizing some important robust features inherent to GC(p).

[1]  John Law,et al.  Robust Statistics—The Approach Based on Influence Functions , 1986 .

[2]  Eva Cantoni,et al.  A robust approach to longitudinal data analysis , 2004 .

[3]  J S Preisser,et al.  Robust Regression for Clustered Data with Application to Binary Responses , 1999, Biometrics.

[4]  Meryl E. Wastney,et al.  5 – REVIEW OF SOFTWARE , 1999 .

[5]  A Ziegler,et al.  Familial associations of lipid profiles: a generalized estimating equations approach. , 2000, Statistics in medicine.

[6]  C. Mallows More comments on C p , 1995 .

[7]  K Y Liang,et al.  Longitudinal data analysis for discrete and continuous outcomes. , 1986, Biometrics.

[8]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[9]  Peter J. Huber,et al.  Robust Statistics , 2005, Wiley Series in Probability and Statistics.

[10]  E. Ronchetti,et al.  Robust Bounded-Influence Tests in General Parametric Models , 1994 .

[11]  C. H. Oh,et al.  Some comments on , 1998 .

[12]  B. Ripley,et al.  Robust Statistics , 2018, Encyclopedia of Mathematical Geosciences.

[13]  Elvezio Ronchetti,et al.  A Robust Version of Mallows's C P , 1994 .

[14]  F. Hampel The Influence Curve and Its Role in Robust Estimation , 1974 .

[15]  P. McCullagh,et al.  Generalized Linear Models , 1984 .

[16]  C. L. Mallows Some comments on C_p , 1973 .

[17]  J. Shao Linear Model Selection by Cross-validation , 1993 .

[18]  M. Piedmonte,et al.  A Method for Generating High-Dimensional Multivariate Binary Variates , 1991 .

[19]  S. Zeger,et al.  Longitudinal data analysis using generalized linear models , 1986 .

[20]  Kung-Yee Liang,et al.  Approximate likelihood ratios for general estimating functions , 1995 .

[21]  C. L. Mallows Some Comments onCp , 1973 .

[22]  E. Ronchetti,et al.  Robust Inference for Generalized Linear Models , 2001 .

[23]  David R. Anderson,et al.  Model selection and multimodel inference : a practical information-theoretic approach , 2003 .

[24]  W. Hauck,et al.  Wald's Test as Applied to Hypotheses in Logit Analysis , 1977 .

[25]  W. Pan Akaike's Information Criterion in Generalized Estimating Equations , 2001, Biometrics.

[26]  Donald Hedeker,et al.  Longitudinal Data Analysis , 2006 .

[27]  D J Dupuis,et al.  Marginally Specified Generalized Linear Mixed Models: A Robust Approach , 2002, Biometrics.

[28]  P. Diggle,et al.  Analysis of Longitudinal Data. , 1997 .

[29]  P. McCullagh,et al.  Generalized Linear Models , 1992 .

[30]  Stuart R. Lipsitz,et al.  Review of Software to Fit Generalized Estimating Equation Regression Models , 1999 .

[31]  Andreas Ziegler,et al.  The Generalised Estimating Equations: A Comparison of Procedures Available in Commercial Statistical Software Packages , 1998 .