Multiple Imputation by Ordered Monotone Blocks With Application to the Anthrax Vaccine Research Program

Multiple imputation (MI) has become a standard statistical technique for dealing with missing values. The CDC Anthrax Vaccine Research Program (AVRP) dataset created new challenges for MI due to the large number of variables of different types and the limited sample size. A common method for imputing missing data in such complex studies is to specify, for each of J variables with missing values, a univariate conditional distribution given all other variables, and then to draw imputations by iterating over the J conditional distributions. Such fully conditional imputation strategies have the theoretical drawback that the conditional distributions may be incompatible. When the missingness pattern is monotone, a theoretically valid approach is to specify, for each variable with missing values, a conditional distribution given the variables with fewer or the same number of missing values and sequentially draw from these distributions. In this article, we propose the “multiple imputation by ordered monotone blocks” approach, which combines these two basic approaches by decomposing any missingness pattern into a collection of smaller “constructed” monotone missingness patterns, and iterating. We apply this strategy to impute the missing data in the AVRP interim data. Supplemental materials, including all source code and a synthetic example dataset, are available online.

[1]  David E. Booth,et al.  Analysis of Incomplete Multivariate Data , 2000, Technometrics.

[2]  Donald B. Rubin,et al.  Nested multiple imputation of NMES via partially incompatible MCMC , 2003 .

[3]  Jörg Drechsler,et al.  Does Convergence Really Matter , 2008 .

[4]  Michael G. Kenward,et al.  Multiple Imputation and its Application: Carpenter/Multiple Imputation and its Application , 2013 .

[5]  D. Rubin,et al.  Fully conditional specification in multivariate imputation , 2006 .

[6]  D. Rubin,et al.  Small-sample degrees of freedom with multiple imputation , 1999 .

[7]  Roderick J. A. Little,et al.  Statistical Analysis with Missing Data: Little/Statistical Analysis with Missing Data , 2002 .

[8]  Shalabh,et al.  Recent Advances in Linear Models and Related Areas: Essays in Honour of Helge Toutenburg , 2008 .

[9]  Harvey Goldstein,et al.  REALCOM-IMPUTE Software for Multilevel Multiple Imputation with Mixed Response Types , 2011 .

[10]  Donald B. Rubin,et al.  Characterizing the Estimation of Parameters in Incomplete-Data Problems , 1974 .

[11]  D. Rubin,et al.  The calculation of posterior distributions by data augmentation , 1987 .

[12]  Lena Osterhagen,et al.  Multiple Imputation For Nonresponse In Surveys , 2016 .

[13]  Patrick Royston,et al.  Multiple Imputation by Chained Equations (MICE): Implementation in Stata , 2011 .

[14]  T. Raghunathan,et al.  Multiple Imputation of Missing Income Data in the National Health Interview Survey , 2006 .

[15]  John Van Hoewyk,et al.  A multivariate technique for multiply imputing missing values using a sequence of regression models , 2001 .

[16]  Sandra K. Martin,et al.  Effects of a reduced dose schedule and intramuscular administration of anthrax vaccine adsorbed on immunogenicity and safety at 7 months: a randomized trial. , 2008, JAMA.

[17]  Stef van Buuren,et al.  Flexible Imputation of Missing Data , 2012 .

[18]  Jerome P. Reiter,et al.  The Multiple Adaptations of Multiple Imputation , 2007 .

[19]  James G. Scott,et al.  Bayesian Inference for Logistic Models Using Pólya–Gamma Latent Variables , 2012, 1205.0310.

[20]  Jörg Drechsler,et al.  Synthetic datasets for statistical disclosure control , 2011 .

[21]  Elizabeth A Stuart,et al.  Multiple imputation with large data sets: a case study of the Children's Mental Health Initiative. , 2009, American journal of epidemiology.

[22]  J. N. K. Rao,et al.  Confidence interval coverage properties for regression: estimators in uni-phase and two-phase sampling , 2003 .

[23]  Donald B. Rubin,et al.  Comment : A noniterative sampling/importance resampling alternative to the data augmentation algorithm for creating a few imputations when fractions of missing information are modest : The SIR Algorithm , 1987 .

[24]  E. George,et al.  Journal of the American Statistical Association is currently published by American Statistical Association. , 2007 .

[25]  Michael G. Kenward,et al.  Multiple Imputation and its Application , 2013 .

[26]  Claus Skaanning,et al.  Markov Chain Monte Carlo Methods , 2006 .

[27]  Jürgen Unützer,et al.  A comparison of imputation methods in a longitudinal randomized clinical trial , 2005, Statistics in medicine.

[28]  Jörg Drechsler,et al.  Multiple Imputation for Nonresponse , 2011 .

[29]  Jerome P. Reiter,et al.  Releasing multiply-imputed synthetic data generated in two stages to protect confidentiality , 2007 .

[30]  Stef van Buuren,et al.  MICE: Multivariate Imputation by Chained Equations in R , 2011 .

[31]  Trevillore E. Raghunathan,et al.  IVEware: Imputation and Variance Estimation Software User Guide , 2002 .

[32]  Roderick J. A. Little,et al.  The NHANES III multiple imputation project , 1996 .

[33]  Coen A. Bernaards,et al.  SOLAS for Missing Data Analysis by Statistical Solutions Ltd.: Software Review , 1999 .

[34]  Nicole A. Lazar,et al.  Statistical Analysis With Missing Data , 2003, Technometrics.

[35]  Xiao-Li Meng,et al.  Applications of multiple imputation in medical studies: from AIDS to NHANES , 1999, Statistical methods in medical research.

[36]  B. Arnold,et al.  Compatible Conditional Distributions , 1989 .

[37]  Yulei He,et al.  Diagnosing imputation models by applying target analyses to posterior replicates of completed data , 2012, Statistics in medicine.

[38]  Samantha R. Cook,et al.  Multiple Imputation in the Anthrax Vaccine Research Program , 2010 .

[39]  Jörg Drechsler,et al.  Synthetic Datasets for Statistical Disclosure Control: Theory and Implementation , 2011 .

[40]  D. Rubin,et al.  Inference from Iterative Simulation Using Multiple Sequences , 1992 .

[41]  D. Rubin Estimating causal effects of treatments in randomized and nonrandomized studies. , 1974 .

[42]  J. Kalbfleisch,et al.  Block-Conditional Missing at Random Models for Missing Data , 2010, 1104.2400.

[43]  Donald B. Rubin,et al.  Assumptions when Analyzing Randomized Experiments with Noncompliance and Missing Outcomes , 2002, Health Services and Outcomes Research Methodology.

[44]  Recail M Yucel,et al.  Imputation of Binary Treatment Variables With Measurement Error in Administrative Data , 2005 .

[45]  H. Akaike A new look at the statistical model identification , 1974 .

[46]  A. Zaslavsky,et al.  Multiple imputation in a large-scale complex survey: a practical guide , 2010, Statistical methods in medical research.

[47]  Trivellore E Raghunathan,et al.  Use of multiple imputation to correct for nonresponse bias in a survey of urologic symptoms among African-American men. , 2002, American journal of epidemiology.

[48]  D. Rubin Multiple Imputation After 18+ Years , 1996 .

[49]  Arthur B. Kennickell,et al.  Imputation of the 1989 Survey of Consumer Finances: Stochastic Relaxation and Multiple Imputation , 1997 .

[50]  Joseph L Schafer,et al.  Robustness of a multivariate normal approximation for imputation of incomplete binary data , 2007, Statistics in medicine.

[51]  Jerome P. Reiter,et al.  Releasing multiply imputed, synthetic public use microdata: an illustration and empirical study , 2005 .

[52]  D B Rubin,et al.  Multiple Imputation for Multivariate Data with Missing and Below‐Threshold Measurements: Time‐Series Concentrations of Pollutants in the Arctic , 2001, Biometrics.

[53]  D B Rubin,et al.  Markov chain Monte Carlo methods in biostatistics , 1996, Statistical methods in medical research.

[54]  John K Kruschke,et al.  Bayesian data analysis. , 2010, Wiley interdisciplinary reviews. Cognitive science.

[55]  Yaming Yu,et al.  Imputing Missing Data by Fully Conditional Models : Some Cautionary Examples and Guidelines , 2012 .

[56]  D. Rubin INFERENCE AND MISSING DATA , 1975 .