“Robust-Squared” Imputation Models Using Bart

Examples of "doubly robust" estimator for missing data include augmented inverse probability weighting (AIPWT) models (Robins et al., 1994) and penalized splines of propensity prediction (PSPP) models (Zhang and Little, 2009). Doubly-robust estimators have the property that, if either the response propensity or the mean is modeled correctly, a consistent estimator of the population mean is obtained. However, doubly-robust estimators can perform poorly when modest misspecification is present in both models (Kang and Schafer, 2007). Here we consider extensions of the AIPWT and PSPP models that use Bayesian Additive Regression Trees (BART; Chipman et al., 2010) to provide highly robust propensity and mean model estimation. We term these "robust-squared" in the sense that the propensity score, the means, or both can be estimated with minimal model misspecification, and applied to the doubly-robust estimator. We consider their behavior via simulations where propensities and/or mean models are misspecified. We apply our proposed method to impute missing instantaneous velocity (delta-v) values from the 2014 National Automotive Sampling System Crashworthiness Data System dataset and missing Blood Alcohol Concentration values from the 2015 Fatality Analysis Reporting System dataset. We found that BART applied to PSPP and AIPWT, provides a more robust and efficient estimate compared to PSPP and AIPWT, with the BART-estimated propensity score combined with PSPP providing the most efficient estimator with close to nominal coverage.

[1]  R. Subramanian,et al.  TRANSITIONING TO MULTIPLE IMPUTATION - A NEW METHOD TO ESTIMATE MISSING BLOOD ALCOHOL CONCENTRATION (BAC) VALUES IN FARS , 2002 .

[2]  Dandan Xu,et al.  Sequential BART for imputation of missing covariates. , 2016, Biostatistics.

[3]  K. Imai,et al.  Covariate balancing propensity score , 2014 .

[4]  Chuanhai Liu Robit Regression: A Simple Robust Alternative to Logistic and Probit Regression , 2005 .

[5]  Joseph Kang,et al.  Demystifying Double Robustness: A Comparison of Alternative Strategies for Estimating a Population Mean from Incomplete Data , 2007, 0804.2958.

[6]  Roderick J. A. Little,et al.  Multiple Imputation for the Fatal Accident Reporting System , 1991 .

[7]  J. Robins,et al.  Estimation of Regression Coefficients When Some Regressors are not Always Observed , 1994 .

[8]  David Haziza,et al.  Doubly robust inference with missing data in survey sampling , 2014 .

[9]  Erich Barke,et al.  Improving Efficiency and Robustness of Analog Behavioral Models , 2007 .

[10]  Michael R Elliott,et al.  Synthetic Multiple-Imputation Procedure for Multistage Complex Samples , 2016, Journal of official statistics.

[11]  Michael R Elliott,et al.  Multiple Imputation in Two-Stage Cluster Samples Using The Weighted Finite Population Bayesian Bootstrap. , 2016, Journal of survey statistics and methodology.

[12]  W. Wong,et al.  The calculation of posterior distributions by data augmentation , 1987 .

[13]  H. Chipman,et al.  BART: Bayesian Additive Regression Trees , 2008, 0806.3286.

[14]  Peisong Han,et al.  A further study of the multiply robust estimator in missing data analysis , 2014 .

[15]  Jae Kwang Kim,et al.  Imputation using response probability , 2006 .

[16]  R. Tibshirani,et al.  Bayesian backfitting (with comments and a rejoinder by the authors , 2000 .

[17]  D. Rubin,et al.  MULTIPLE IMPUTATION OF MISSING BLOOD ALCOHOL CONCENTRATION (BAC) VALUES IN FARS , 1998 .

[18]  Radu V. Craiu,et al.  Nonparametric imputation method for nonresponse in surveys , 2016, Statistical Methods & Applications.

[19]  S. Chib,et al.  Bayesian analysis of binary and polychotomous response data , 1993 .

[20]  D. Rubin,et al.  Statistical Analysis with Missing Data. , 1989 .

[21]  Roderick J. A. Little,et al.  Statistical Analysis with Missing Data: Little/Statistical Analysis with Missing Data , 2002 .

[22]  V. Rocková,et al.  Posterior Concentration for Bayesian Regression Trees and their Ensembles , 2017 .

[23]  Guangyu Zhang,et al.  Extensions of the Penalized Spline of Propensity Prediction Method of Imputation , 2009, Biometrics.

[24]  Sixia Chen,et al.  Multiply robust imputation procedures for the treatment of item nonresponse in surveys , 2017 .

[25]  R. Little,et al.  Robust Likelihood-based Analysis of Multivariate Data with Missing Values , 2003 .

[26]  Adam Kapelner,et al.  Prediction with missing data via Bayesian Additive Regression Trees , 2013, ArXiv.

[27]  T M Klein,et al.  A METHOD FOR ESTIMATING POSTERIOR BAC DISTRIBUTIONS FOR PERSONS INVOLVED IN FATAL TRAFFIC ACCIDENTS , 1986 .

[28]  B. Ripley,et al.  Semiparametric Regression: Preface , 2003 .

[29]  James Hedlund Traffic safety performance measures for states and federal agencies , 2008 .

[30]  Hao Wang,et al.  Bayesian quantile additive regression trees , 2016, 1607.02676.

[31]  Jerome P. Reiter,et al.  The importance of modeling the sampling design in multiple imputation for missing data , 2006 .

[32]  Lu Wang,et al.  Estimation with missing data: beyond double robustness , 2013 .

[33]  Andrew J Copas,et al.  Combining Multiple Imputation and Inverse-Probability Weighting , 2012, Biometrics.

[34]  Russell V. Lenth,et al.  Statistical Analysis With Missing Data (2nd ed.) (Book) , 2004 .

[35]  D. Cox,et al.  An Analysis of Transformations , 1964 .