A Non-Parametric EM-Style Algorithm for Imputing Missing Values

We present an iterative non-parametric algorithm for imputing missing values. The algorithm is similar to EM except that it uses non-parametric models such as k-nearest neighbor or kernel regression instead of the parametric models used with EM. An interesting feature of the algorithm is that the E and M steps collapse into a single step because the data being lled in is the model { updating the lled-in values updates the model at the same time. The main advantages of this approach compared to parametric EM methods are that: 1) it is more e cient for moderate size data sets, and 2) it is less susceptible to errors that parametric methods make when the parametric models do not t the data well. The robustness to model failure makes the non-parametric method more accurate when models of the data are not known apriori and cannot be determined reliably. We evaluate the method using a real medical data set that has many missing values.