Multiply Robust Estimation in Regression Analysis With Missing Data

Doubly robust estimators are widely used in missing-data analysis. They provide double protection on estimation consistency against model misspecifications. However, they allow only a single model for the missingness mechanism and a single model for the data distribution, and the assumption that one of these two models is correctly specified is restrictive in practice. For regression analysis with possibly missing outcome, we propose an estimation method that allows multiple models for both the missingness mechanism and the data distribution. The resulting estimator is consistent if any one of those multiple models is correctly specified, and thus provides multiple protection on consistency. This estimator is also robust against extreme values of the fitted missingness probability, which, for most doubly robust estimators, can lead to erroneously large inverse probability weights that may jeopardize the numerical performance. The numerical implementation of the proposed method through a modified Newton–Raphson algorithm is discussed. The asymptotic distribution of the resulting estimator is derived, based on which we study the estimation efficiency and provide ways to improve the efficiency. As an application, we analyze the data collected from the AIDS Clinical Trials Group Protocol 175.

[1]  S. Hammer,et al.  A trial comparing nucleoside monotherapy with combination therapy in HIV-infected adults with CD4 cell counts from 200 to 500 per cubic millimeter. AIDS Clinical Trials Group Study 175 Study Team. , 1996, The New England journal of medicine.

[2]  Dong Wang,et al.  EMPIRICAL LIKELIHOOD FOR ESTIMATING EQUATIONS WITH MISSING VALUES , 2009, 0903.0726.

[3]  Q. Shao,et al.  A general bahadur representation of M-estimators and its application to linear regression with nonstochastic designs , 1996 .

[4]  Wu Using empirical likelihood methods to obtain range restricted weights in regression estimators for surveys , 2002 .

[5]  Marie Davidian,et al.  Improving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates , 2008, Biometrics.

[6]  Andrea Rotnitzky,et al.  Nonparametric Regression With Missing Outcomes Using Weighted Kernel Estimating Equations , 2010, Journal of the American Statistical Association.

[7]  J. Robins,et al.  Inference for imputation estimators , 2000 .

[8]  Zhiqiang Tan Comment: Improved Local Efficiency and Double Robustness , 2008, The international journal of biostatistics.

[9]  J. Robins,et al.  Semiparametric Efficiency in Multivariate Regression Models with Missing Data , 1995 .

[10]  Biao Zhang,et al.  Efficient and Doubly Robust Imputation for Covariate-Dependent Missing Responses , 2008 .

[11]  Wenqing He,et al.  Median Regression Models for Longitudinal Data with Dropouts , 2009, Biometrics.

[12]  Thomas R. Fleming,et al.  Auxiliary outcome data and the mean score method , 1994 .

[13]  Bohdana Ratitch,et al.  Doubly Robust Estimation , 2014 .

[14]  Norman E. Breslow,et al.  Semiparametric efficient estimation for the auxiliary outcome problem with the conditional mean model , 2004 .

[15]  J. Robins,et al.  Doubly Robust Estimation in Missing Data and Causal Inference Models , 2005, Biometrics.

[16]  Bovas Abraham,et al.  Adjusted Empirical Likelihood and its Properties , 2008 .

[17]  M. Davidian,et al.  Semiparametric Estimation of Treatment Effect in a Pretest-Posttest Study with Missing Data. , 2005, Statistical science : a review journal of the Institute of Mathematical Statistics.

[18]  A. Owen Empirical likelihood ratio confidence intervals for a single functional , 1988 .

[19]  Raymond J Carroll,et al.  Multiple imputation in quantile regression. , 2012, Biometrika.

[20]  KyungMann Kim,et al.  Contrasting treatment‐specific survival using double‐robust estimators , 2012 .

[21]  James M. Robins,et al.  Unified Methods for Censored Longitudinal Data and Causality , 2003 .

[22]  A. Tsiatis Semiparametric Theory and Missing Data , 2006 .

[23]  Peisong Han,et al.  A further study of the multiply robust estimator in missing data analysis , 2014 .

[24]  R. Koenker,et al.  Regression Quantiles , 2007 .

[25]  A. Rotnitzky Inverse probability weighted methods , 2008 .

[26]  Marie Davidian,et al.  Improved Doubly Robust Estimation When Data Are Monotonely Coarsened, with Application to Longitudinal Studies with Dropout , 2011, Biometrics.

[27]  M. J. van der Laan,et al.  The International Journal of Biostatistics Collaborative Double Robust Targeted Maximum Likelihood Estimation , 2011 .

[28]  H. White Maximum Likelihood Estimation of Misspecified Models , 1982 .

[29]  Nicole A. Lazar,et al.  Statistical Analysis With Missing Data , 2003, Technometrics.

[30]  Zhiqiang Tan,et al.  Comment: Understanding OR, PS and DR , 2007, 0804.2969.

[31]  J. Robins,et al.  Estimation of Regression Coefficients When Some Regressors are not Always Observed , 1994 .

[32]  J. Robins,et al.  Analysis of semiparametric regression models for repeated outcomes in the presence of missing data , 1995 .

[33]  Margaret S. Pepe,et al.  Inference using surrogate outcome data and a validation sample , 1992 .

[34]  Zhiqiang Tan,et al.  A Distributional Approach for Causal Inference Using Propensity Scores , 2006 .

[35]  J. Robins,et al.  Adjusting for Nonignorable Drop-Out Using Semiparametric Nonresponse Models , 1999 .

[36]  B. Nan,et al.  A revisit of semiparametric regression models with missing data , 2006 .

[37]  Jing Qin,et al.  Improving semiparametric estimation by using surrogate data , 2008 .

[38]  Biao Zhang,et al.  Empirical Likelihood in Missing Data Problems , 2009 .

[39]  Lu Wang,et al.  Estimation with missing data: beyond double robustness , 2013 .

[40]  George E. P. Box,et al.  Empirical Model‐Building and Response Surfaces , 1988 .

[41]  J. Lawless,et al.  Empirical Likelihood and General Estimating Equations , 1994 .

[42]  D. Rubin,et al.  The central role of the propensity score in observational studies for causal effects , 1983 .

[43]  R. Koenker Quantile Regression: Name Index , 2005 .

[44]  Mark J van der Laan,et al.  Empirical Efficiency Maximization: Improved Locally Efficient Covariate Adjustment in Randomized Experiments and Survival Analysis , 2008, The international journal of biostatistics.

[45]  A note on improving the efficiency of inverse probability weighted estimator using the augmentation term , 2012 .

[46]  Zhiqiang Tan,et al.  Bounded, efficient and doubly robust estimation with inverse weighting , 2010 .

[47]  D. Horvitz,et al.  A Generalization of Sampling Without Replacement from a Finite Universe , 1952 .

[48]  J. Robins,et al.  Improved double-robust estimation in missing data and causal inference models. , 2012, Biometrika.

[49]  M. Davidian,et al.  Improving efficiency and robustness of the doubly robust estimator for a population mean with incomplete data , 2009, Biometrika.

[50]  Biao Zhang,et al.  Empirical‐likelihood‐based inference in missing response problems and its application in observational studies , 2007 .

[51]  J. Robins,et al.  Semiparametric regression estimation in the presence of dependent censoring , 1995 .

[52]  S. Lipsitz,et al.  Quantile Regression Methods for Longitudinal Data with Drop‐outs: Application to CD4 Cell Counts of Patients Infected with the Human Immunodeficiency Virus , 1997 .

[53]  Roderick J. A. Little,et al.  Statistical Analysis with Missing Data: Little/Statistical Analysis with Missing Data , 2002 .

[54]  Joseph Kang,et al.  Demystifying Double Robustness: A Comparison of Alternative Strategies for Estimating a Population Mean from Incomplete Data , 2007, 0804.2958.

[55]  J. Robins,et al.  Comment: Performance of Double-Robust Estimators When “Inverse Probability” Weights Are Highly Variable , 2007, 0804.2965.