Non-parametric approach for frequentist multiple imputation in survival analysis with missing covariates

In clinical and epidemiological studies using survival analysis, some explanatory variables are often missing. When this occurs, multiple imputation (MI) is frequently used in practice. In many cases, simple parametric imputation models are routinely adopted without checking the validity of the model specification. Misspecified imputation models can cause biased parameter estimates. In this study, we describe novel frequentist type MI procedures for survival analysis using proportional and additive hazards models. The procedures are based on non-parametric estimation techniques and do not require the correct specification of parametric imputation models. For continuous missing covariates, we first sample imputation values from a parametric imputation model. Then, we obtain estimates by solving the estimating equation modified by non-parametrically estimated conditional densities. For categorical missing covariates, we directly sample imputation values from a non-parametrically estimated conditional distribution and then obtain estimates by solving the corresponding estimating equation. We evaluate the performance of the proposed procedures using simulation studies: one uses simulated data; another uses data informed by parameters generated from a real-world medical claims database. We also applied the procedures to a pharmacoepidemiological study that examined the effect of antihyperlipidemics on hyperglycemia incidence.

[1]  Masashi Sugiyama,et al.  Improving the Accuracy of Least-Squares Probabilistic Classifiers , 2011, IEICE Trans. Inf. Syst..

[2]  David R. Cox,et al.  Regression models and life tables (with discussion , 1972 .

[3]  Kamal R Mahtani,et al.  Missing laboratory test data in electronic general practice records: analysis of rheumatoid factor recording in the clinical practice research datalink , 2015, Pharmacoepidemiology and drug safety.

[4]  G. Jong,et al.  Statins and new-onset diabetes: a retrospective longitudinal cohort study. , 2012, Clinical therapeutics.

[5]  Hironori Fujisawa,et al.  A bias‐corrected estimator in multiple imputation for missing data , 2018, Statistics in medicine.

[6]  A. Kesselheim,et al.  Desmopressin and the risk of hyponatremia: A population-based cohort study , 2019, PLoS medicine.

[7]  Ian W. McKeague,et al.  A partly parametric additive risk model , 1994 .

[8]  Lena Osterhagen,et al.  Multiple Imputation For Nonresponse In Surveys , 2016 .

[9]  Ofer Harel,et al.  Fitting additive hazards models for case-cohort studies: a multiple imputation approach. , 2016, Statistics in medicine.

[10]  D. Rubin Multiple Imputation After 18+ Years , 1996 .

[11]  Takafumi Kanamori,et al.  Density Ratio Estimation in Machine Learning , 2012 .

[12]  S. Setoguchi,et al.  Lipid-lowering drugs and risk of new-onset diabetes: a cohort study using Japanese healthcare data linked to clinical data for health screening , 2017, BMJ Open.

[13]  Tara Gomes,et al.  Risk of incident diabetes among patients treated with statins: population based study , 2013, BMJ.

[14]  P. Ueda,et al.  Sodium glucose cotransporter 2 inhibitors and risk of serious adverse events: nationwide register based cohort study , 2018, British Medical Journal.

[15]  Jared S. Murray,et al.  Multiple Imputation: A Review of Practical and Theoretical Findings , 2018, 1801.04058.

[16]  T. Martinussen,et al.  Large sample results for frequentist multiple imputation for Cox regression with missing covariate data , 2019, Annals of the Institute of Statistical Mathematics.

[17]  James M. Robins,et al.  Large-sample theory for parametric multiple imputation procedures , 1998 .

[18]  Chiu-Hsieh Hsu,et al.  Cox regression analysis with missing covariates via nonparametric multiple imputation , 2019, Statistical methods in medical research.

[19]  Jae Kwang Kim Parametric fractional imputation for missing data analysis , 2011 .

[20]  Katharina Burger,et al.  Counting Processes And Survival Analysis , 2016 .

[21]  R. Gill,et al.  Cox's regression model for counting processes: a large sample study : (preprint) , 1982 .

[22]  J. Robins,et al.  Inference for imputation estimators , 2000 .

[23]  M. Joffe,et al.  Exploring the effect of erythropoietin on mortality using USRDS data , 2013, Pharmacoepidemiology and drug safety.

[24]  Torben Martinussen,et al.  Dynamic Regression Models for Survival Data , 2006 .

[25]  Jerome P. Reiter,et al.  Multiple Imputation of Missing Categorical and Continuous Values via Bayesian Mixture Models With Local Dependence , 2014, 1410.0438.

[26]  Eric J Tchetgen Tchetgen,et al.  Principled Approaches to Missing Data in Epidemiologic Studies. , 2018, American journal of epidemiology.

[27]  Masashi Sugiyama,et al.  Superfast-Trainable Multi-Class Probabilistic Classifier by Least-Squares Posterior Fitting , 2010, IEICE Trans. Inf. Syst..

[28]  Odd Aalen,et al.  A Model for Nonparametric Regression Analysis of Counting Processes , 1980 .

[29]  Paul T. von Hippel,et al.  New Confidence Intervals and Bias Comparisons Show That Maximum Likelihood Can Beat Multiple Imputation in Small Samples , 2013, 1307.5875.

[30]  Maximum Likelihood Multiple Imputation: Faster Imputations and Consistent Standard Errors Without Posterior Draws , 2012, 1210.0870.

[31]  James R Carpenter,et al.  Multiple imputation of covariates by fully conditional specification: Accommodating the substantive model , 2012, Statistical methods in medical research.

[32]  D. Harrington,et al.  Counting Processes and Survival Analysis: Fleming/Counting , 2005 .

[33]  Xiao-Li Meng,et al.  Multiple-Imputation Inferences with Uncongenial Sources of Input , 1994 .

[34]  J. Carpenter,et al.  Practice of Epidemiology Comparison of Random Forest and Parametric Imputation Models for Imputing Missing Data Using MICE: A CALIBER Study , 2014 .

[35]  S. Ikeda,et al.  Development of a Database of Health Insurance Claims: Standardization of Disease Classifications and Anonymous Record Linkage , 2010, Journal of epidemiology.

[36]  Takafumi Kanamori,et al.  Least-Squares Conditional Density Estimation , 2010, IEICE Trans. Inf. Syst..

[37]  A. Tsiatis Semiparametric Theory and Missing Data , 2006 .

[38]  Eric J Tchetgen Tchetgen,et al.  Multiple Imputation for Incomplete Data in Epidemiologic Studies , 2018, American journal of epidemiology.