Convergence Properties of a Sequential Regression Multiple Imputation Algorithm

A sequential regression or chained equations imputation approach uses a Gibbs sampling-type iterative algorithm that imputes the missing values using a sequence of conditional regression models. It is a flexible approach for handling different types of variables and complex data structures. Many simulation studies have shown that the multiple imputation inferences based on this procedure have desirable repeated sampling properties. However, a theoretical weakness of this approach is that the specification of a set of conditional regression models may not be compatible with a joint distribution of the variables being imputed. Hence, the convergence properties of the iterative algorithm are not well understood. This article develops conditions for convergence and assesses the properties of inferences from both compatible and incompatible sequence of regression models. The results are established for the missing data pattern where each subject may be missing a value on at most one variable. The sequence of regression models are assumed to be empirically good fit for the data chosen by the imputer based on appropriate model diagnostics. The results are used to develop criteria for the choice of regression models. Supplementary materials for this article are available online.

[1]  Patrick Royston,et al.  Multiple Imputation of Missing Values: Update , 2005 .

[2]  Jörg Drechsler,et al.  Does Convergence Really Matter , 2008 .

[3]  B. Arnold,et al.  Conditionally Specified Distributions: An Introduction (with comments and a rejoinder by the authors) , 2001 .

[4]  Enrique Castillo,et al.  Conditionally Specified Distributions , 1992 .

[5]  T. Speed,et al.  Characterizing a joint probability distribution by conditionals , 1993 .

[6]  T. W. Anderson Maximum Likelihood Estimates for a Multivariate Normal Distribution when Some Observations are Missing , 1957 .

[7]  D. Novins,et al.  Imputing missing data. , 2004, Journal of the American Academy of Child and Adolescent Psychiatry.

[8]  T. Speed,et al.  Corrigendum: Characterizing a joint probability distribution by conditionals , 1999 .

[9]  John Van Hoewyk,et al.  A multivariate technique for multiply imputing missing values using a sequence of regression models , 2001 .

[10]  D. Rubin,et al.  Fully conditional specification in multivariate imputation , 2006 .

[11]  Arthur B. Kennickell,et al.  Imputation of the 1989 Survey of Consumer Finances: Stochastic Relaxation and Multiple Imputation , 1997 .

[12]  Donald B. Rubin,et al.  Multiple Imputation by Ordered Monotone Blocks With Application to the Anthrax Vaccine Research Program , 2014 .

[13]  Yaming Yu,et al.  Imputing Missing Data by Fully Conditional Models : Some Cautionary Examples and Guidelines , 2012 .

[14]  B. Arnold,et al.  Compatible Conditional Distributions , 1989 .

[15]  R. Fildes Journal of the Royal Statistical Society (B): Gary K. Grunwald, Adrian E. Raftery and Peter Guttorp, 1993, “Time series of continuous proportions”, 55, 103–116.☆ , 1993 .

[16]  Colin Rose Bivariate Distributions , 2011, International Encyclopedia of Statistical Science.

[17]  S. van Buuren,et al.  Flexible mutlivariate imputation by MICE , 1999 .

[18]  D. Rubin INFERENCE AND MISSING DATA , 1975 .

[19]  B. Arnold,et al.  Conditionally specified distributions: an introduction , 2001 .

[20]  B. C. Arnold,et al.  Bivariate Distributions with Conditionals in Prescribed Exponential Families , 1991 .