Bias from the use of generalized estimating equations to analyze incomplete longitudinal binary data

Patient dropout is a common problem in studies that collect repeated binary measurements. Generalized estimating equations (GEE) are often used to analyze such data. The dropout mechanism may be plausibly missing at random (MAR), i.e. unrelated to future measurements given covariates and past measurements. In this case, various authors have recommended weighted GEE with weights based on an assumed dropout model, or an imputation approach, or a doubly robust approach based on weighting and imputation. These approaches provide asymptotically unbiased inference, provided the dropout or imputation model (as appropriate) is correctly specified. Other authors have suggested that, provided the working correlation structure is correctly specified, GEE using an improved estimator of the correlation parameters (‘modified GEE’) show minimal bias. These modified GEE have not been thoroughly examined. In this paper, we study the asymptotic bias under MAR dropout of these modified GEE, the standard GEE, and also GEE using the true correlation. We demonstrate that all three methods are biased in general. The modified GEE may be preferred to the standard GEE and are subject to only minimal bias in many MAR scenarios but in others are substantially biased. Hence, we recommend the modified GEE be used with caution.

[1]  S R Lipsitz,et al.  Estimation methods for the join distribution of repeated binary observations. , 1995, Biometrics.

[2]  Martin Crowder,et al.  On the use of a working correlation matrix in using generalised linear models for repeated measures , 1995 .

[3]  D. Cooper,et al.  Virological and immunological outcomes at 3 years after starting antiretroviral therapy with regimens containing non-nucleoside reverse transcriptase inhibitor, protease inhibitor, or both in INITIO: open-label randomised trial , 2006, The Lancet.

[4]  Myunghee C. Paik,et al.  The generalized estimating equation approach when data are not missing completely at random , 1997 .

[5]  S. Lipsitz,et al.  Estimating equations for measures of association between repeated binary responses. , 1996, Biometrics.

[6]  Paul J Rathouz,et al.  Performance of weighted estimating equations for longitudinal binary data with drop‐outs missing at random , 2002, Statistics in medicine.

[7]  J. Robins,et al.  Analysis of semiparametric regression models for repeated outcomes in the presence of missing data , 1995 .

[8]  Andrea Rotnitzky,et al.  Estimation of regression models for the mean of repeated outcomes under nonignorable nonmonotone nonresponse. , 2007, Biometrika.

[9]  J. Robins,et al.  Doubly Robust Estimation in Missing Data and Causal Inference Models , 2005, Biometrics.

[10]  T. Park,et al.  A comparison of the generalized estimating equation approach with the maximum likelihood approach for repeated measurements. , 1993, Statistics in medicine.

[11]  Roderick J. A. Little,et al.  Statistical Analysis with Missing Data: Little/Statistical Analysis with Missing Data , 2002 .

[12]  Nicole A. Lazar,et al.  Statistical Analysis With Missing Data , 2003, Technometrics.

[13]  K. Walker Analysis of repeated measurements from medical research when observations are missing , 2007 .

[14]  S J Pocock,et al.  Impact of missing data due to drop‐outs on estimators for rates of change in longitudinal studies: a simulation study , 2001, Statistics in medicine.

[15]  G Molenberghs,et al.  GEE with Gaussian Estimation of the Correlations When Data Are Incomplete , 2000, Biometrics.

[16]  James M. Robins,et al.  Semiparametric Regression for Repeated Outcomes With Nonignorable Nonresponse , 1998 .

[17]  J. Ware,et al.  Random-effects models for longitudinal data. , 1982, Biometrics.

[18]  Kholoud Porter,et al.  The creation of a large UK‐based multicentre cohort of HIV‐infected individuals: The UK Collaborative HIV Cohort (UK CHIC) Study , 2004, HIV medicine.

[19]  A. Rotnitzky,et al.  A note on the bias of estimators with missing data. , 1994, Biometrics.

[20]  S. Zeger,et al.  Longitudinal data analysis using generalized linear models , 1986 .

[21]  Andrew Copas,et al.  Doubly robust generalized estimating equations for longitudinal data , 2009, Statistics in medicine.

[22]  P. Diggle,et al.  Modelling multivariate binary data with alternating logistic regressions , 1993 .

[23]  G Molenberghs,et al.  Bias in estimating association parameters for longitudinal binary responses with drop-outs. , 2001, Biometrics.

[24]  Y. Matsuyama Sensitivity analysis for the estimation of rates of change with non‐ignorable drop‐out: an application to a randomized clinical trial of the vitamin D3 , 2003, Statistics in medicine.