Stability of Ensemble Feature Selection on High-Dimension and Low-Sample Size Data - Influence of the Aggregation Method

Feature selection is an important step when building a classifier. However, the feature selection tends to be unstable on high-dimension and small-sample size data. This instability reduces the usefulness of selected features for knowledge discovery: if the selected feature subset is not robust, domain experts can have little trust that they are relevant. A growing number of studies deal with feature selection stability. Based on the idea that ensemble methods are commonly used to improve classifiers accuracy and stability, some works focused on the stability of ensemble feature selection methods. So far, they obtained mixed results, and as far as we know no study extensively studied how the choice of the aggregation method influences the stability of ensemble feature selection. This is what we study in this preliminary work. We first present some aggregation methods, then we study the stability of ensemble feature selection based on them, on both artificial and real data, as well as the resulting classification performance.

[1]  Yi Yang,et al.  Co-Regularized Ensemble for Feature Selection , 2013, IJCAI.

[2]  Taghi M. Khoshgoftaar,et al.  Ensemble Gene Selection Versus Single Gene Selection: Which Is Better? , 2013, FLAIRS Conference.

[3]  Ludmila I. Kuncheva,et al.  A stability index for feature selection , 2007, Artificial Intelligence and Applications.

[4]  Yvan Saeys,et al.  Robust Feature Selection Using Ensemble Feature Selection Techniques , 2008, ECML/PKDD.

[5]  Yue Han,et al.  A Variance Reduction Framework for Stable Feature Selection , 2010, 2010 IEEE International Conference on Data Mining.

[6]  Blaise Hanczar,et al.  Analysis of feature selection stability on high dimension and small sample data , 2014, Comput. Stat. Data Anal..

[7]  L. Ein-Dor,et al.  Thousands of samples are needed to generate a robust gene list for predicting outcome in cancer. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[8]  Lei Liu,et al.  Ensemble gene selection by grouping for microarray data classification , 2010, J. Biomed. Informatics.

[9]  Jean-Philippe Vert,et al.  The Influence of Feature Selection Methods on Accuracy, Stability and Interpretability of Molecular Signatures , 2011, PloS one.

[10]  Anil K. Jain,et al.  39 Dimensionality and sample size considerations in pattern recognition practice , 1982, Classification, Pattern Recognition and Reduction of Dimensionality.

[11]  Jana Novovicová,et al.  Evaluating Stability and Comparing Output of Feature Selectors that Optimize Feature Subset Cardinality , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  James Surowiecki The wisdom of crowds: Why the many are smarter than the few and how collective wisdom shapes business, economies, societies, and nations Doubleday Books. , 2004 .

[13]  N. Meinshausen,et al.  Stability selection , 2008, 0809.2932.

[14]  Lior Rokach,et al.  Ensemble-based classifiers , 2010, Artificial Intelligence Review.

[15]  Albert Y. Zomaya,et al.  A Review of Ensemble Methods in Bioinformatics , 2010, Current Bioinformatics.

[16]  Thibault Helleputte,et al.  Robust biomarker identification for cancer diagnosis with ensemble feature selection methods , 2010, Bioinform..

[17]  Richard Simon,et al.  Supervised analysis when the number of candidate features (p) greatly exceeds the number of cases (n) , 2003, SKDD.

[18]  Pavel Pudil,et al.  Criteria Ensembles in Feature Selection , 2009, MCS.