Multiple Imputation for Missing Values through Conditional Semiparametric Odds Ratio Models

Multiple imputation is a practically useful approach to handling incompletely observed data in statistical analysis. Parameter estimation and inference based on imputed full data have been made easy by Rubin's rule for result combination. However, creating proper imputation that accommodates flexible models for statistical analysis in practice can be very challenging. We propose an imputation framework that uses conditional semiparametric odds ratio models to impute the missing values. The proposed imputation framework is more flexible and robust than the imputation approach based on the normal model. It is a compatible framework in comparison to the approach based on fully conditionally specified models. The proposed algorithms for multiple imputation through the Markov chain Monte Carlo sampling approach can be straightforwardly carried out. Simulation studies demonstrate that the proposed approach performs better than existing, commonly used imputation approaches. The proposed approach is applied to imputing missing values in bone fracture data.

[1]  Fritz Scheuren,et al.  Multiple Imputation , 2005 .

[2]  Purushottam W. Laud,et al.  Bayesian Nonparametric Inference for Random Distributions and Related Functions , 1999 .

[3]  H. Y. Chen A note on the prospective analysis of outcome‐dependent samples , 2003 .

[4]  Fernando A. Quintana,et al.  Nonparametric Bayesian data analysis , 2004 .

[5]  J. Schafer Multiple imputation: a primer , 1999, Statistical methods in medical research.

[6]  Jun S. Liu,et al.  Monte Carlo strategies in scientific computing , 2001 .

[7]  H. Y. Chen Nonparametric and Semiparametric Models for Missing Covariates in Parametric Regression , 2004 .

[8]  A. Kennedy,et al.  Hybrid Monte Carlo , 1988 .

[9]  John Geweke,et al.  Evaluating the accuracy of sampling-based approaches to the calculation of posterior moments , 1991 .

[10]  Joseph L Schafer,et al.  Analysis of Incomplete Multivariate Data , 1997 .

[11]  T. Speed,et al.  Characterizing a joint probability distribution by conditionals , 1993 .

[12]  Xiao-Hua Zhou,et al.  Multiple imputation: review of theory, implementation and software , 2007, Statistics in medicine.

[13]  John Van Hoewyk,et al.  A multivariate technique for multiply imputing missing values using a sequence of regression models , 2001 .

[14]  Hua Yun Chen A Semiparametric Odds Ratio Model for Measuring Association , 2007, Biometrics.

[15]  Michael G Kenward,et al.  Multiple imputation: current perspectives , 2007, Statistical methods in medical research.

[16]  Ofer Harel,et al.  Inferences on missing information under multiple imputation and two-stage multiple imputation , 2007 .

[17]  T. Ferguson Prior Distributions on Spaces of Probability Measures , 1974 .

[18]  M. Escobar Estimating Normal Means with a Dirichlet Process Prior , 1994 .

[19]  Roderick J. A. Little,et al.  Statistical Analysis with Missing Data: Little/Statistical Analysis with Missing Data , 2002 .

[20]  D. Rubin INFERENCE AND MISSING DATA , 1975 .

[21]  C. Antoniak Mixtures of Dirichlet Processes with Applications to Bayesian Nonparametric Problems , 1974 .

[22]  T. Ferguson A Bayesian Analysis of Some Nonparametric Problems , 1973 .

[23]  T. Speed,et al.  Corrigendum: Characterizing a joint probability distribution by conditionals , 1999 .

[24]  D. Blackwell,et al.  Ferguson Distributions Via Polya Urn Schemes , 1973 .

[25]  D. Rubin Multiple Imputation After 18+ Years , 1996 .

[26]  Orton,et al.  Multiple Imputation in Practice , 2001 .

[27]  H. Boshuizen,et al.  Multiple imputation of missing blood pressure covariates in survival analysis. , 1999, Statistics in medicine.

[28]  Roger A. Sugden,et al.  Multiple Imputation for Nonresponse in Surveys , 1988 .

[29]  S. Lipsitz,et al.  Missing-Data Methods for Generalized Linear Models , 2005 .

[30]  J. Graham,et al.  How Many Imputations are Really Needed? Some Practical Clarifications of Multiple Imputation Theory , 2007, Prevention Science.

[31]  Jun S. Liu,et al.  Sequential importance sampling for nonparametric Bayes models: The next generation , 1999 .

[32]  S. van Buuren Multiple imputation of discrete and continuous data by fully conditional specification , 2007, Statistical methods in medical research.

[33]  Oliver Rivero-Arias,et al.  Evaluation of software for multiple imputation of semi-continuous data , 2007, Statistical methods in medical research.

[34]  L. Tierney Markov Chains for Exploring Posterior Distributions , 1994 .