Mutual information for feature selection with missing data

Feature selection is an important task for many machine learning applications; moreover missing data are encoutered very often in practice. This paper proposes to adapt a nearest neighbors based mutual information estimator to handle missing data and to use it to achieve feature selection. Results on artificial and real world datasets show that the method is able to select important features without the need for any imputation algorithm. Moreover, experiments also indicate that selecting the features before imputing the data generally increases the precision of the prediction models.