Selection of Working Correlation Structure and Best Model in GEE Analyses of Longitudinal Data

The Generalized Estimating Equations (GEE) method is one of the most commonly used statistical methods for the analysis of longitudinal data in epidemiological studies. A working correlation structure for the repeated measures of the outcome variable of a subject needs to be specified by this method. However, statistical criteria for selecting the best correlation structure and the best subset of explanatory variables in GEE are only available recently because the GEE method is developed on the basis of quasi-likelihood theory. Maximum likelihood based model selection methods, such as the widely used Akaike Information Criterion (AIC), are not applicable to GEE directly. Pan (2001) proposed a selection method called QIC which can be used to select the best correlation structure and the best subset of explanatory variables. Based on the QIC method, we developed a computing program to calculate the QIC value for a range of different distributions, link functions and correlation structures. This program was written in Stata software. In this article, we introduce this program and demonstrate how to use it to select the most parsimonious model in GEE analyses of longitudinal data through several representative examples.

[1]  R. W. Wedderburn Quasi-likelihood functions, generalized linear models, and the Gauss-Newton method , 1974 .

[2]  W. J. Langford Statistical Methods , 1959, Nature.

[3]  J. Ware,et al.  Applied Longitudinal Analysis , 2004 .

[4]  Y. Takane,et al.  Estimation of Growth Curve Models with Structured Error Covariances by Generalized Estimating Equations , 2005 .

[5]  G. Casella,et al.  Explaining the Gibbs Sampler , 1992 .

[6]  Guoqi Qian,et al.  Computations and analysis in robust regression model selection using stochastic complexity , 1999, Comput. Stat..

[7]  Eric R. Ziegel,et al.  Generalized Linear Models , 2002, Technometrics.

[8]  W. K. Hastings,et al.  Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .

[9]  Jost B Jonas,et al.  GEE approaches to marginal regression models for medical diagnostic tests , 2004, Statistics in medicine.

[10]  Guoqi Qian,et al.  Using MCMC for Logistic Regression Model Selection Involving Large Number of Candidate Models , 2002 .

[11]  E Cantoni,et al.  Longitudinal variable selection by cross‐validation in the case of many covariates , 2007, Statistics in medicine.

[12]  H. Akaike A new look at the statistical model identification , 1974 .

[13]  G. Fillenbaum,et al.  Comparison of methods for analyzing longitudinal binary outcomes: cognitive status as an example , 2003, Aging & mental health.

[14]  C S Berkey,et al.  Distribution of forced vital capacity and forced expiratory volume in one second in children 6 to 11 years of age. , 1983, The American review of respiratory disease.

[15]  Gary A. Ballinger,et al.  Using Generalized Estimating Equations for Longitudinal Data Analysis , 2004 .

[16]  Y. Takane,et al.  AN EXTENDED MULTIVARIATE RANDOM-EFFECTS GROWTH CURVE MODEL , 2005 .

[17]  Edward C. Chao,et al.  Generalized Estimating Equations , 2003, Technometrics.

[18]  G. Giles,et al.  After BRCA1 and BRCA2-what next? Multifactorial segregation analyses of three-generation, population-based Australian families affected by female breast cancer. , 2001, American journal of human genetics.

[19]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[20]  S. Zeger,et al.  Longitudinal data analysis using generalized linear models , 1986 .

[21]  P. Diggle Analysis of Longitudinal Data , 1995 .

[22]  J. Hopper,et al.  A prospective longitudinal study of serum testosterone, dehydroepiandrosterone sulfate, and sex hormone-binding globulin levels through the menopause transition. , 2000, The Journal of clinical endocrinology and metabolism.

[23]  P. McCullagh,et al.  Generalized Linear Models , 1992 .

[24]  Robert F. Woolson,et al.  Analysis of categorical incomplete longitudinal data , 1984 .

[25]  D. English,et al.  Segregation analyses of 1,476 population-based Australian families affected by prostate cancer. , 2001, American journal of human genetics.

[26]  W. Pan Akaike's Information Criterion in Generalized Estimating Equations , 2001, Biometrics.

[27]  George Gabor,et al.  Generalised linear model selection by the predictive least quasi-deviance criterion , 1996 .

[28]  Elvezio Ronchetti,et al.  Variable Selection for Marginal Longitudinal Generalized Linear Models , 2003, Biometrics.

[29]  Tx Station Stata Statistical Software: Release 7. , 2001 .