Sequential Imputations and Bayesian Missing Data Problems

Abstract For missing data problems, Tanner and Wong have described a data augmentation procedure that approximates the actual posterior distribution of the parameter vector by a mixture of complete data posteriors. Their method of constructing the complete data sets is closely related to the Gibbs sampler. Both required iterations, and, similar to the EM algorithm, convergence can be slow. We introduce in this article an alternative procedure that involves imputing the missing data sequentially and computing appropriate importance sampling weights. In many applications this new procedure works very well without the need for iterations. Sensitivity analysis, influence analysis, and updating with new data can be performed cheaply. Bayesian prediction and model selection can also be incorporated. Examples taken from a wide range of applications are used for illustration.

[1]  D. Cox Note on Grouping , 1957 .

[2]  P. Odell,et al.  A Numerical Procedure to Generate a Sample Covariance Matrix , 1966 .

[3]  G. C. Tiao,et al.  Bayesian inference in statistical analysis , 1973 .

[4]  T. Ferguson Prior Distributions on Spaces of Probability Measures , 1974 .

[5]  B. Efron,et al.  Data Analysis Using Stein's Estimator and its Generalizations , 1975 .

[6]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[7]  T. Speed,et al.  Markov Fields and Log-Linear Interaction Models for Contingency Tables , 1980 .

[8]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  T. Speed,et al.  Recursive causal models , 1984, Journal of the Australian Mathematical Society. Series A. Pure Mathematics and Statistics.

[10]  D. Rubin,et al.  The calculation of posterior distributions by data augmentation , 1987 .

[11]  Donald B. Rubin,et al.  Comment : A noniterative sampling/importance resampling alternative to the data augmentation algorithm for creating a few imputations when fractions of missing information are modest : The SIR Algorithm , 1987 .

[12]  D. Rubin Multiple imputation for nonresponse in surveys , 1989 .

[13]  W. Wong,et al.  The calculation of posterior distributions by data augmentation , 1987 .

[14]  D. Rubin,et al.  Statistical Analysis with Missing Data. , 1989 .

[15]  David J. Spiegelhalter,et al.  Local computations with probabilities on graphical structures and their application to expert systems , 1990 .

[16]  J. Besag A candidate's formula: A curious result in Bayesian prediction , 1989 .

[17]  L. Tierney,et al.  Approximate methods for assessing influence and sensitivity in Bayesian analysis , 1989 .

[18]  Adrian F. M. Smith,et al.  Sampling-Based Approaches to Calculating Marginal Densities , 1990 .

[19]  David J. Spiegelhalter,et al.  Sequential updating of conditional probabilities on directed graphical structures , 1990, Networks.

[20]  A. Gelfand,et al.  Nonparametric Bayesian bioassay including ordered polytomous response , 1991 .

[21]  A. Kong,et al.  Sequential imputation and multipoint linkage analysis , 1993, Genetic epidemiology.

[22]  A. Dawid,et al.  Hyper Markov Laws in the Statistical Analysis of Decomposable Graphical Models , 1993 .

[23]  Jun S. Liu,et al.  Covariance structure of the Gibbs sampler with applications to the comparisons of estimators and augmentation schemes , 1994 .

[24]  M. Escobar Estimating Normal Means with a Dirichlet Process Prior , 1994 .

[25]  Jun S. Liu Nonparametric hierarchical Bayes via sequential imputations , 1996 .