Relative efficiency of joint-model and full-conditional-specification multiple imputation when conditional models are compatible: The general location model

Estimating the parameters of a regression model of interest is complicated by missing data on the variables in that model. Multiple imputation is commonly used to handle these missing data. Joint model multiple imputation and full-conditional specification multiple imputation are known to yield imputed data with the same asymptotic distribution when the conditional models of full-conditional specification are compatible with that joint model. We show that this asymptotic equivalence of imputation distributions does not imply that joint model multiple imputation and full-conditional specification multiple imputation will also yield asymptotically equally efficient inference about the parameters of the model of interest, nor that they will be equally robust to misspecification of the joint model. When the conditional models used by full-conditional specification multiple imputation are linear, logistic and multinomial regressions, these are compatible with a restricted general location joint model. We show that multiple imputation using the restricted general location joint model can be substantially more asymptotically efficient than full-conditional specification multiple imputation, but this typically requires very strong associations between variables. When associations are weaker, the efficiency gain is small. Moreover, full-conditional specification multiple imputation is shown to be potentially much more robust than joint model multiple imputation using the restricted general location model to mispecification of that model when there is substantial missingness in the outcome variable.

[1]  A. Gelman,et al.  ON THE STATIONARY DISTRIBUTION OF ITERATIVE IMPUTATIONS , 2010, 1012.2902.

[2]  E. Hyppönen,et al.  Prenatal Exposures and Glucose Metabolism in Adulthood , 2007, Diabetes Care.

[3]  Harvey Goldstein,et al.  Multilevel models with multivariate mixed response types , 2009 .

[4]  Roderick J. A. Little,et al.  Statistical Analysis with Missing Data , 1988 .

[5]  D. Rubin Multiple Imputation After 18+ Years , 1996 .

[6]  Andrew Gelman,et al.  Multiple Imputation for Continuous and Categorical Data: Comparing Joint Multivariate Normal and Conditional Approaches , 2014, Political Analysis.

[7]  Stef van Buuren,et al.  Flexible Imputation of Missing Data , 2012 .

[8]  Yoav Ben-Shlomo,et al.  Birth weight; postnatal, infant, and childhood growth; and obesity in young adulthood: evidence from the Barry Caerphilly Growth Study. , 2007, The American journal of clinical nutrition.

[9]  Derivation of Large Sample Efficiency of Multinomial Logistic Regression Compared to Multiple Group Discriminant Analysis , 1987 .

[10]  Russell V. Lenth,et al.  Statistical Analysis With Missing Data (2nd ed.) (Book) , 2004 .

[11]  James R Carpenter,et al.  Joint modelling rationale for chained equations , 2014, BMC Medical Research Methodology.

[12]  C. Power,et al.  Cohort profile: 1958 British birth cohort (National Child Development Study). , 2006, International journal of epidemiology.

[13]  Thomas R Sullivan,et al.  Bias and Precision of the "Multiple Imputation, Then Deletion" Method for Dealing With Missing Outcome Data. , 2015, American journal of epidemiology.

[14]  Roderick J. A. Little,et al.  Statistical Analysis with Missing Data: Little/Statistical Analysis with Missing Data , 2002 .

[15]  T. Raghunathan,et al.  Convergence Properties of a Sequential Regression Multiple Imputation Algorithm , 2015 .

[16]  Nicole A. Lazar,et al.  Statistical Analysis With Missing Data , 2003, Technometrics.

[17]  Paul T. von Hippel Regression with missing Ys: An improved strategy for analyzing multiply imputed data , 2007 .

[18]  J. Robins,et al.  Inference for imputation estimators , 2000 .

[19]  Yaming Yu,et al.  Imputing Missing Data by Fully Conditional Models : Some Cautionary Examples and Guidelines , 2012 .

[20]  Xiao-Li Meng,et al.  Multiple-Imputation Inferences with Uncongenial Sources of Input , 1994 .

[21]  B. Efron The Efficiency of Logistic Regression Compared to Normal Discriminant Analysis , 1975 .

[22]  David E. Booth,et al.  Analysis of Incomplete Multivariate Data , 2000, Technometrics.

[23]  John B Carlin,et al.  Multiple imputation for missing data: fully conditional specification versus multivariate normal imputation. , 2010, American journal of epidemiology.

[24]  Michael G. Kenward,et al.  Multiple Imputation and its Application: Carpenter/Multiple Imputation and its Application , 2013 .