Weighted Generalized Estimating Functions for Longitudinal Response and Covariate Data That Are Missing at Random

Longitudinal studies often feature incomplete response and covariate data. It is well known that biases can arise from naive analyses of available data, but the precise impact of incomplete data depends on the frequency of missing data and the strength of the association between the response variables and covariates and the missing-data indicators. Various factors may influence the availability of response and covariate data at scheduled assessment times, and at any given assessment time the response may be missing, covariate data may be missing, or both response and covariate data may be missing. Here we show that it is important to take the association between the missing data indicators for these two processes into account through joint models. Inverse probability-weighted generalized estimating equations offer an appealing approach for doing this. Here we develop these equations for a particular model generating intermittently missing-at-random data. Empirical studies demonstrate that the consistent estimators arising from the proposed methods have very small empirical biases in moderate samples. Supplemental materials are available online.

[1]  J. Copas,et al.  Local sensitivity approximations for selectivity bias , 2001 .

[2]  M. Kenward,et al.  A comparison of multiple imputation and doubly robust estimation for analyses with missing data , 2006 .

[3]  Ana Ivelisse Avilés,et al.  Linear Mixed Models for Longitudinal Data , 2001, Technometrics.

[4]  Joseph G. Ibrahim,et al.  A Weighted Estimating Equation for Missing Covariate Data with Properties Similar to Maximum Likelihood , 1999 .

[5]  J. Copas,et al.  Meta-analysis, funnel plots and sensitivity analysis. , 2000, Biostatistics.

[6]  J. Robins,et al.  Semiparametric Efficiency in Multivariate Regression Models with Missing Data , 1995 .

[7]  Geert Molenberghs,et al.  A Local Influence Approach Applied to Binary Data from a Psychiatric Study , 2003, Biometrics.

[8]  Geert Molenberghs,et al.  Sensitivity analysis for incomplete contingency tables: the Slovenian plebiscite case , 2001 .

[9]  G Molenberghs,et al.  Bias in estimating association parameters for longitudinal binary responses with drop-outs. , 2001, Biometrics.

[10]  M. Davidian,et al.  Semiparametric Estimation of Treatment Effect in a Pretest-Posttest Study with Missing Data. , 2005, Statistical science : a review journal of the Institute of Mathematical Statistics.

[11]  Jörg Drechsler,et al.  Multiple Imputation for Nonresponse , 2011 .

[12]  P. Diggle,et al.  Modelling multivariate binary data with alternating logistic regressions , 1993 .

[13]  M. Stone An Asymptotic Equivalence of Choice of Model by Cross‐Validation and Akaike's Criterion , 1977 .

[14]  S. Lipsitz,et al.  Generalized estimating equations for correlated binary data: Using the odds ratio as a measure of association , 1991 .

[15]  D. Cox The Analysis of Multivariate Binary Data , 1972 .

[16]  M. Kenward Selection models for repeated measurements with non-random dropout: an illustration of sensitivity. , 1998, Statistics in medicine.

[17]  S. le Cessie,et al.  Predictive value of statistical models. , 1990, Statistics in medicine.

[18]  K. Brown,et al.  Effectiveness of a social influences smoking prevention program as a function of provider type, training method, and school risk. , 1999, American journal of public health.

[19]  N M Laird,et al.  Maximum likelihood analysis of generalized linear models with missing covariates , 1999, Statistical methods in medical research.

[20]  Nicole A. Lazar,et al.  Statistical Analysis With Missing Data , 2003, Technometrics.

[21]  James M. Robins,et al.  Unified Methods for Censored Longitudinal Data and Causality , 2003 .

[22]  G. Molenberghs,et al.  Models for Discrete Longitudinal Data , 2005 .

[23]  S. Zeger,et al.  Longitudinal data analysis using generalized linear models , 1986 .

[24]  Michael Woodroofe,et al.  Estimation in Large Samples , 1988 .

[25]  Joseph G Ibrahim,et al.  Theory and Inference for Regression Models with Missing Responses and Covariates. , 2008, Journal of multivariate analysis.

[26]  Geert Molenberghs,et al.  A local influence approach to sensitivity analysis of incomplete longitudinal ordinal data , 2001 .

[27]  R. Irizarry,et al.  Generalized Additive Selection Models for the Analysis of Studies with Potentially Nonignorable Missing Outcome Data , 2003, Biometrics.

[28]  Joseph Kang,et al.  Demystifying Double Robustness: A Comparison of Alternative Strategies for Estimating a Population Mean from Incomplete Data , 2007, 0804.2958.

[29]  D. O. Scharfstein Adjusting for nonignorable dropout using semiparametric nonresponse models (with discussion) , 1999 .

[30]  J. Robins,et al.  Analysis of semiparametric regression models for repeated outcomes in the presence of missing data , 1995 .

[31]  Marie Davidian,et al.  Comment: Demystifying Double Robustness: A Comparison of Alternative Strategies for Estimating a Population Mean from Incomplete Data. , 2008, Statistical science : a review journal of the Institute of Mathematical Statistics.

[32]  J. Copas,et al.  Inference for Non‐random Samples , 1997 .

[33]  Paul J Rathouz,et al.  Performance of weighted estimating equations for longitudinal binary data with drop‐outs missing at random , 2002, Statistics in medicine.

[34]  A. Rotnitzky,et al.  A note on the bias of estimators with missing data. , 1994, Biometrics.

[35]  M Schemper,et al.  Explained variation for logistic regression. , 1996, Statistics in medicine.

[36]  J. Robins,et al.  Doubly Robust Estimation in Missing Data and Causal Inference Models , 2005, Biometrics.

[37]  Geert Molenberghs,et al.  Regression Models for Longitudinal Binary Responses with Informative Drop‐Outs , 1995 .

[38]  R. Prentice,et al.  Correlated binary regression with covariates specific to each binary observation. , 1988, Biometrics.

[39]  Richard J Cook,et al.  Marginal Analysis of Incomplete Longitudinal Binary Data: A Cautionary Note on LOCF Imputation , 2004, Biometrics.

[40]  Donald Hedeker,et al.  Longitudinal Data Analysis , 2006 .

[41]  M. Pepe,et al.  A cautionary note on inference for marginal regression models with longitudinal data and general correlated response data , 1994 .

[42]  Richard J. Cook,et al.  Marginal Methods for Incomplete Longitudinal Data Arising in Clusters , 2002 .

[43]  Joseph G. Ibrahim,et al.  Using auxiliary data for parameter estimation with non‐ignorably missing outcomes , 2001 .

[44]  James M. Robins,et al.  Semiparametric Regression for Repeated Outcomes With Nonignorable Nonresponse , 1998 .

[45]  J. Lunceford,et al.  Stratification and weighting via the propensity score in estimation of causal treatment effects: a comparative study , 2004, Statistics in medicine.

[46]  J. Robins,et al.  Adjusting for Nonignorable Drop-Out Using Semiparametric Nonresponse Models , 1999 .

[47]  S. Lipsitz,et al.  Regression analysis with missing covariate data using estimating equations. , 1996, Biometrics.

[48]  G Molenberghs,et al.  Sensitivity Analysis for Nonrandom Dropout: A Local Influence Approach , 2001, Biometrics.

[49]  D. Rubin Multiple Imputation After 18+ Years , 1996 .

[50]  S. Lipsitz,et al.  Missing-Data Methods for Generalized Linear Models , 2005 .

[51]  Sik-Yum Lee,et al.  Local influence for incomplete data models , 2001 .

[52]  S R Lipsitz,et al.  Estimation methods for the join distribution of repeated binary observations. , 1995, Biometrics.

[53]  Michelle Shardell,et al.  Weighted estimating equations for longitudinal studies with death and non‐monotone missing time‐dependent covariates and outcomes , 2008, Statistics in medicine.

[54]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[55]  J. Robins,et al.  Sensitivity Analysis for Selection bias and unmeasured Confounding in missing Data and Causal inference models , 2000 .