Bayesian multiple imputation for large-scale categorical data with structural zeros

We propose an approach for multiple imputation of items missing at random in large-scale surveys with exclusively categorical variables that have structural zeros. Our approach is to use mixtures of multinomial distributions as imputation engines, accounting for structural zeros by conceiving of the observed data as a truncated sample from a hypothetical population without structural zeros. This approach has several appealing features: imputations are generated from coherent, Bayesian joint models that automatically capture complex dependencies and readily scale to large numbers of variables. We outline a Gibbs sampling algorithm for implementing the approach, and we illustrate its potential with a repeated sampling study using public use census microdata from the state of New York, USA.

[1]  S. van Buuren,et al.  Flexible mutlivariate imputation by MICE , 1999 .

[2]  John Van Hoewyk,et al.  A multivariate technique for multiply imputing missing values using a sequence of regression models , 2001 .

[3]  Stephen E. Fienberg,et al.  Discrete Multivariate Analysis: Theory and Practice , 1976 .

[4]  D. Rubin,et al.  Statistical Analysis with Missing Data , 1988 .

[5]  Patrick Suppes,et al.  When are Probabilistic Explanations Possible , 1981 .

[6]  Patrick Royston,et al.  Multiple imputation using chained equations: Issues and guidance for practice , 2011, Statistics in medicine.

[7]  Alexander Hehmeyer,et al.  Nonparametric Bayesian Multiple Imputation for Incomplete Categorical Variables in Large-Scale Assessment Surveys , 2013 .

[8]  Joseph L Schafer,et al.  Analysis of Incomplete Multivariate Data , 1997 .

[9]  Xiao-Li Meng,et al.  Applications of multiple imputation in medical studies: from AIDS to NHANES , 1999, Statistical methods in medical research.

[10]  A. Zaslavsky,et al.  Domain-Level Covariance Analysis for Multilevel Survey Data With Structured Nonresponse , 2008 .

[11]  D. Rubin,et al.  Statistical Analysis with Missing Data. , 1989 .

[12]  Xiao-Li Meng,et al.  Single observation unbiased priors , 2002 .

[13]  D. Dunson,et al.  Nonparametric Bayes Modeling of Multivariate Categorical Data , 2009, Journal of the American Statistical Association.

[14]  Jerome P. Reiter,et al.  The Multiple Adaptations of Multiple Imputation , 2007 .

[15]  S. Ruggles Integrated Public Use Microdata Series , 2021, Encyclopedia of Gerontology and Population Aging.

[16]  Lancelot F. James,et al.  Gibbs Sampling Methods for Stick-Breaking Priors , 2001 .

[17]  D. Rubin Multiple Imputation After 18+ Years , 1996 .

[18]  J. Vermunt,et al.  9. Multiple Imputation of Incomplete Categorical Data Using Latent Class Analysis , 2008 .

[19]  Roger A. Sugden,et al.  Multiple Imputation for Nonresponse in Surveys , 1988 .

[20]  N. Ebrahimi,et al.  Bayesian capture-recapture methods for error detection and estimation of population size: Heterogeneity and dependence , 2001 .

[21]  J. Sethuraman A CONSTRUCTIVE DEFINITION OF DIRICHLET PRIORS , 1991 .

[22]  Xiao-Hua Zhou,et al.  Multiple imputation: review of theory, implementation and software , 2007, Statistics in medicine.