A method for increasing the robustness of multiple imputation

Missing data are common wherever statistical methods are applied in practice. They present a problem in that they require that additional assumptions be made about the mechanism leading to the incompleteness of the data. By incorporating two models for the missing data process, doubly robust (DR) weighting-based methods offer some protection against misspecification bias since inferences are valid when at least one of the two models is correctly specified. The balance between robustness, efficiency and analytical complexity is one which is difficult to strike, resulting in a split between the likelihood and multiple imputation (MI) school on one hand and the weighting and DR school on the other. An extension of MI is proposed that, in certain settings, can be shown to give rise to DR estimators. It is conjectured that this additional robustness holds more generally, as demonstrated using simulation studies. The method is applied to data from the RECORD study, a clinical trial comparing anti-glycaemic combination therapies in type II diabetes patients.

[1]  R D Gill,et al.  Non-response models for the analysis of non-monotone ignorable missing data. , 1997, Statistics in medicine.

[2]  M. Kenward,et al.  A comparison of multiple imputation and doubly robust estimation for analyses with missing data , 2006 .

[3]  Zhiqiang Tan,et al.  Bounded, efficient and doubly robust estimation with inverse weighting , 2010 .

[4]  Stef van Buuren,et al.  Multiple imputation of discrete and continuous data by fully conditional specification , 2007 .

[5]  Andrea Rotnitzky,et al.  Estimation of regression models for the mean of repeated outcomes under nonignorable nonmonotone nonresponse. , 2007, Biometrika.

[6]  M. Hanefeld,et al.  Rosiglitazone RECORD study: glucose control outcomes at 18 months , 2007, Diabetic medicine : a journal of the British Diabetic Association.

[7]  Marie Davidian,et al.  Improved Doubly Robust Estimation When Data Are Monotonely Coarsened, with Application to Longitudinal Studies with Dropout , 2011, Biometrics.

[8]  H. Boshuizen,et al.  Multiple imputation of missing blood pressure covariates in survival analysis. , 1999, Statistics in medicine.

[9]  S. Vansteelandt,et al.  On model selection and model misspecification in causal inference , 2012, Statistical methods in medical research.

[10]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[11]  M. Kendall Theoretical Statistics , 1956, Nature.

[12]  J. Robins,et al.  Adjusting for Nonignorable Drop-Out Using Semiparametric Nonresponse Models , 1999 .

[13]  J. Robins,et al.  Estimating exposure effects by modelling the expectation of exposure conditional on confounders. , 1992, Biometrics.

[14]  Trevillore E. Raghunathan,et al.  IVEware: Imputation and Variance Estimation Software User Guide , 2002 .

[15]  D. Berry,et al.  Statistical models in epidemiology, the environment, and clinical trials , 2000 .

[16]  J. Robins,et al.  Sensitivity Analysis for Selection bias and unmeasured Confounding in missing Data and Causal inference models , 2000 .

[17]  Geert Molenberghs,et al.  A nonparametric approach to weighted estimating equations for regression analysis with missing covariates , 2012, Comput. Stat. Data Anal..

[18]  J. Robins,et al.  Doubly Robust Estimation in Missing Data and Causal Inference Models , 2005, Biometrics.

[19]  Joseph Kang,et al.  Demystifying Double Robustness: A Comparison of Alternative Strategies for Estimating a Population Mean from Incomplete Data , 2007, 0804.2958.

[20]  D. Rubin,et al.  Fully conditional specification in multivariate imputation , 2006 .

[21]  R Little,et al.  Intent-to-treat analysis for longitudinal studies with drop-outs. , 1996, Biometrics.

[22]  D. Horvitz,et al.  A Generalization of Sampling Without Replacement from a Finite Universe , 1952 .

[23]  Geert Molenberghs,et al.  Monotone missing data and pattern‐mixture models , 1998 .

[24]  Michael G. Kenward,et al.  Nonrandom Missingness in Categorical Data: Strengt hs and Limitations , 1999 .

[25]  A. Tsiatis Semiparametric Theory and Missing Data , 2006 .

[26]  J. Robins,et al.  Estimation of Regression Coefficients When Some Regressors are not Always Observed , 1994 .

[27]  M. Kenward Selection models for repeated measurements with non-random dropout: an illustration of sensitivity. , 1998, Statistics in medicine.

[28]  J. Robins,et al.  Analysis of semiparametric regression models for repeated outcomes in the presence of missing data , 1995 .

[29]  D. Rubin INFERENCE AND MISSING DATA , 1975 .

[30]  James M. Robins,et al.  Marginal Structural Models versus Structural nested Models as Tools for Causal inference , 2000 .

[31]  Nicole A. Lazar,et al.  Statistical Analysis With Missing Data , 2003, Technometrics.

[32]  M. Kenward,et al.  Every missingness not at random model has a missingness at random counterpart with equal fit , 2008 .