An interactive feature selection method based on learning-from-crowds

Ensemble feature selection algorithms aggregate the results of multiple feature selection methods in order to select an effective subset of features. However, typically, ensemble algorithms treat each feature selection method equally and do not consider performance differences. Consequently, features selected by a relatively smaller number of methods may not be included. To address this problem, we propose an interactive feature selection method that can more effectively aggregate the results of multiple feature selection methods and iteratively improve the selected features by integrating expert knowledge. The proposed method includes a learning-from-crowds-based ensemble feature selection algorithm and a visual analysis system. The algorithm models the performance of multiple feature selection methods, calculates their reliabilities, and aggregates results. To integrate expert knowledge, the visual analysis system provides a set of ranking schemes to assist experts in understanding the results of an individual feature selection method and the roles played by the features in classification tasks. A numerical experiment conducted on four real-world datasets shows that the proposed algorithm can improve classification accuracy by 0.63%–2.85% compared to state-of-the-art ensemble feature selection algorithms. In addition, we conducted case studies on text and image data to demonstrate that the proposed visual analysis system can further improve classification accuracy by 0.28%–5.24%.