论文信息 - A Non-Parametric EM-Style Algorithm for Imputing Missing Values

A Non-Parametric EM-Style Algorithm for Imputing Missing Values

We present an iterative non-parametric algorithm for imputing missing values. The algorithm is similar to EM except that it uses non-parametric models such as k-nearest neighbor or kernel regression instead of the parametric models used with EM. An interesting feature of the algorithm is that the E and M steps collapse into a single step because the data being lled in is the model { updating the lled-in values updates the model at the same time. The main advantages of this approach compared to parametric EM methods are that: 1) it is more e cient for moderate size data sets, and 2) it is less susceptible to errors that parametric methods make when the parametric models do not t the data well. The robustness to model failure makes the non-parametric method more accurate when models of the data are not known apriori and cannot be determined reliably. We evaluate the method using a real medical data set that has many missing values.

Rich Caruana | R. Caruana

[1] Michael I. Jordan,et al. Supervised learning from incomplete data via an EM approach , 1993, NIPS.

[2] S. Lauritzen. The EM algorithm for graphical association models with missing data , 1995 .

[3] Constantin F. Aliferis,et al. An evaluation of machine-learning methods for predicting pneumonia mortality , 1997, Artif. Intell. Medicine.

[4] Nicole A. Lazar,et al. Statistical Analysis With Missing Data , 2003, Technometrics.