Stable and Accurate Feature Selection from Microarray Data with Ensembled Fast Correlation Based Filter

Feature selection has been playing an important role in analyzing the high-dimension and low-sample-size gene expression profiles towards high classification performance of diseases and deep understanding of the underlying biological mechanisms. Besides classification performance, the stability of selected features is another non-ignorable factor in evaluating a feature selector, since stable feature selection results enhance the confidence of selected features for true biomarker discovery and further biological validation. In this study, we propose a novel feature selection method under the ensemble learning framework. Specifically, we take Fast Correlation Based Filter as the base feature selector to analyze subsamples of microarray data. We then present several aggregation methods to combine multiple feature subsets. Finally, two stability measures are used to quantify the robustness of feature selectors to data variations. Our comparative empirical study on publicly available datasets demonstrates the superiority of the proposed methods over its competitors in obtaining high stability scores and classification accuracy.