A comparison of multiple imputation and fully augmented weighted estimators for Cox regression with missing covariates

Several approaches exist for handling missing covariates in the Cox proportional hazards model. The multiple imputation (MI) is relatively easy to implement with various software available and results in consistent estimates if the imputation model is correct. On the other hand, the fully augmented weighted estimators (FAWEs) recover a substantial proportion of the efficiency and have the doubly robust property. In this paper, we compare the FAWEs and the MI through a comprehensive simulation study. For the MI, we consider the multiple imputation by chained equation and focus on two imputation methods: Bayesian linear regression imputation and predictive mean matching. Simulation results show that the imputation methods can be rather sensitive to model misspecification and may have large bias when the censoring time depends on the missing covariates. In contrast, the FAWEs allow the censoring time to depend on the missing covariates and are remarkably robust as long as getting either the conditional expectations or the selection probability correct due to the doubly robust property. The comparison suggests that the FAWEs show the potential for being a competitive and attractive tool for tackling the analysis of survival data with missing covariates.

[1]  M. Kenward,et al.  A comparison of multiple imputation and doubly robust estimation for analyses with missing data , 2006 .

[2]  Stef van Buuren,et al.  MICE: Multivariate Imputation by Chained Equations in R , 2011 .

[3]  Patrick Royston,et al.  Multiple Imputation of Missing Values: Update , 2005 .

[4]  Hakan Demirtas,et al.  Simulation driven inferences for multiply imputed longitudinal datasets * , 2004 .

[5]  R. Prentice,et al.  Weighted Estimators for Proportional Hazards Regression With Missing Covariates , 2005 .

[6]  Jon A. Wellner,et al.  Information bounds for Cox regression models with missing data , 2004, math/0406452.

[7]  Xiao-Hua Zhou,et al.  Multiple imputation: review of theory, implementation and software , 2007, Statistics in medicine.

[8]  S. van Buuren,et al.  Multivariate Imputation by Chained Equations : Mice V1.0 User's manual , 2000 .

[9]  D. Cox Regression Models and Life-Tables , 1972 .

[10]  Joseph L Schafer,et al.  Analysis of Incomplete Multivariate Data , 1997 .

[11]  H. Boshuizen,et al.  Multiple imputation of missing blood pressure covariates in survival analysis. , 1999, Statistics in medicine.

[12]  Roger A. Sugden,et al.  Multiple Imputation for Nonresponse in Surveys , 1988 .

[13]  S. Lipsitz,et al.  Missing-Data Methods for Generalized Linear Models , 2005 .

[14]  Myunghee Cho Paik Multiple Imputation for the Cox Proportional Hazards Model with Missing Covariates , 1997, Lifetime data analysis.

[15]  Jeremy MG Taylor,et al.  Partially parametric techniques for multiple imputation , 1996 .

[16]  John Van Hoewyk,et al.  A multivariate technique for multiply imputing missing values using a sequence of regression models , 2001 .

[17]  Irva Hertz-Picciotto,et al.  The CHARGE Study: An Epidemiologic Investigation of Genetic and Environmental Factors Contributing to Autism , 2006, Environmental health perspectives.

[18]  D. G. Altman,et al.  Review of survival analyses published in cancer journals. , 1995, British Journal of Cancer.

[19]  Søren Feodor Nielsen,et al.  1. Statistical Analysis with Missing Data (2nd edn). Roderick J. Little and Donald B. Rubin, John Wiley & Sons, New York, 2002. No. of pages: xv+381. ISBN: 0‐471‐18386‐5 , 2004 .

[20]  Theo Stijnen,et al.  Using the outcome for imputation of missing predictor values was preferred. , 2006, Journal of clinical epidemiology.

[21]  J. Schafer,et al.  A comparison of inclusive and restrictive strategies in modern missing data procedures. , 2001, Psychological methods.

[22]  Mark Woodward,et al.  Imputations of missing values in practice: results from imputations of serum cholesterol in 28 cohort studies. , 2004, American journal of epidemiology.

[23]  J. Robins,et al.  Estimation of Regression Coefficients When Some Regressors are not Always Observed , 1994 .

[24]  Roderick J. A. Little,et al.  Statistical Analysis with Missing Data: Little/Statistical Analysis with Missing Data , 2002 .

[25]  G. S. Watson,et al.  Smooth regression analysis , 1964 .

[26]  E. Nadaraya On Estimating Regression , 1964 .

[27]  Roderick J. A. Little Regression with Missing X's: A Review , 1992 .

[28]  R. Little,et al.  Proportional hazards regression with missing covariates , 1999 .

[29]  Vicente Núñez-Antón,et al.  Comparing proportional hazards and accelerated failure time models for survival analysis , 2002, Statistics in medicine.

[30]  R. Little Missing-Data Adjustments in Large Surveys , 1988 .

[31]  D. Horvitz,et al.  A Generalization of Sampling Without Replacement from a Finite Universe , 1952 .

[32]  David R. Cox,et al.  Regression models and life tables (with discussion , 1972 .

[33]  C Y Wang,et al.  Augmented Inverse Probability Weighted Estimator for Cox Missing Covariate Regression , 2001, Biometrics.