Nonlinear multiple imputation for continuous covariate within semiparametric Cox model: application to HIV data in Senegal

Multiple imputation is commonly used to impute missing covariate in Cox semiparametric regression setting. It is to fill each missing data with more plausible values, via a Gibbs sampling procedure, specifying an imputation model for each missing variable. This imputation method is implemented in several softwares that offer imputation models steered by the shape of the variable to be imputed, but all these imputation models make an assumption of linearity on covariates effect. However, this assumption is not often verified in practice as the covariates can have a nonlinear effect. Such a linear assumption can lead to a misleading conclusion because imputation model should be constructed to reflect the true distributional relationship between the missing values and the observed values. To estimate nonlinear effects of continuous time invariant covariates in imputation model, we propose a method based on B-splines function. To assess the performance of this method, we conducted a simulation study, where we compared the multiple imputation method using Bayesian splines imputation model with multiple imputation using Bayesian linear imputation model in survival analysis setting. We evaluated the proposed method on the motivated data set collected in HIV-infected patients enrolled in an observational cohort study in Senegal, which contains several incomplete variables. We found that our method performs well to estimate hazard ratio compared with the linear imputation methods, when data are missing completely at random, or missing at random.

[1]  J. Daurès,et al.  Regression splines for threshold selection in survival data analysis. , 2001, Statistics in medicine.

[2]  B. Mallick,et al.  Generalized Nonlinear Modeling With Multivariate Free-Knot Regression Splines , 2003 .

[3]  Iris Pigeot,et al.  Primary Prevention from the Epidemiology Perspective: Three Examples from the Practice , 2010, BMC medical research methodology.

[4]  Carl de Boor,et al.  A Practical Guide to Splines , 1978, Applied Mathematical Sciences.

[5]  Xiao-Hua Zhou,et al.  Multiple imputation: review of theory, implementation and software , 2007, Statistics in medicine.

[6]  N. Molinari,et al.  Free Knot Splines with RJMCMC in Survival Data Analysis , 2010 .

[7]  E. Blood,et al.  Performance of mixed effects models in the analysis of mediated longitudinal data , 2010, BMC medical research methodology.

[8]  S. van Buuren Multiple imputation of discrete and continuous data by fully conditional specification , 2007, Statistical methods in medical research.

[9]  Jean Gaudart,et al.  The performance of multiple imputation for missing covariate data within the context of regression relative survival analysis , 2008, Statistics in medicine.

[10]  Stephen R Cole,et al.  Use of multiple imputation in the epidemiologic literature. , 2008, American journal of epidemiology.

[11]  Odile Pons Estimation in the Cox Model with Missing Covariate Data , 2002 .

[12]  John B. Carlin,et al.  Bias and efficiency of multiple imputation compared with complete‐case analysis for missing covariate values , 2010, Statistics in medicine.

[13]  Ralf Bender,et al.  Generating survival times to simulate Cox proportional hazards models , 2005, Statistics in medicine.

[14]  Michael G Kenward,et al.  Multiple imputation: current perspectives , 2007, Statistical methods in medical research.

[15]  Nicolas Molinari,et al.  Bounded optimal knots for regression splines , 2004, Comput. Stat. Data Anal..

[16]  Joseph L Schafer,et al.  Analysis of Incomplete Multivariate Data , 1997 .

[17]  W Vach,et al.  Biased estimation of the odds ratio in case-control studies due to the use of ad hoc methods of correcting for missing values for confounding variables. , 1991, American journal of epidemiology.

[18]  Allan Donner,et al.  Imputation Strategies for Missing Continuous Outcomes in Cluster Randomized Trials , 2008, Biometrical journal. Biometrische Zeitschrift.

[19]  Roderick J A Little,et al.  A Review of Hot Deck Imputation for Survey Non‐response , 2010, International statistical review = Revue internationale de statistique.

[20]  Steve Kaye,et al.  Baseline Plasma Viral Load and CD4 Cell Percentage Predict Survival in HIV-1- and HIV-2-Infected Women in a Community-Based Cohort in The Gambia , 2005, Journal of acquired immune deficiency syndromes.

[21]  Ross J. Harris,et al.  Causes of death in HIV-1-infected patients treated with antiretroviral therapy, 1996-2006: collaborative analysis of 13 HIV cohort studies. , 2010, Clinical infectious diseases : an official publication of the Infectious Diseases Society of America.

[22]  A. Mackinnon,et al.  The use and reporting of multiple imputation in medical research – a review , 2010, Journal of internal medicine.

[23]  W. Tsai,et al.  On using the Cox proportional hazards model with missing covariates , 1997 .

[24]  Paul Zhang Multiple Imputation: Theory and Method , 2003 .

[25]  I. J. Schoenberg,et al.  On Pólya frequency functions IV: The fundamental spline functions and their limits , 1966 .

[26]  S Greenland,et al.  A critical look at methods for handling missing covariates in epidemiologic regression analyses. , 1995, American journal of epidemiology.

[27]  M. Kenward,et al.  Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls , 2009, BMJ : British Medical Journal.

[28]  Long‐term effectiveness and safety of didanosine combined with lamivudine and efavirenz or nevirapine in antiretroviral‐naive patients: a 9‐year cohort study in Senegal , 2011, Tropical medicine & international health : TM & IH.

[29]  J. Ibrahim,et al.  Likelihood-Based Methods for Missing Covariates in the Cox Proportional Hazards Model , 2001 .

[30]  Douglas G Altman,et al.  Comparison of imputation methods for handling missing covariate data when fitting a Cox proportional hazards model: a resampling study , 2010, BMC medical research methodology.

[31]  Douglas G Altman,et al.  Comparison of techniques for handling missing covariate data within prognostic modelling studies: a simulation study , 2010, BMC medical research methodology.

[32]  Patrick Royston,et al.  The design of simulation studies in medical statistics , 2006, Statistics in medicine.

[33]  D. Rubin INFERENCE AND MISSING DATA , 1975 .

[34]  D. Rubin,et al.  Multiple Imputation for Nonresponse in Surveys , 1989 .

[35]  Z. Ying,et al.  Cox Regression with Incomplete Covariate Measurements , 1993 .

[36]  Stef van Buuren,et al.  MICE: Multivariate Imputation by Chained Equations in R , 2011 .