Ensemble feature selection for high-dimensional data: a stability analysis across multiple domains

Selecting a subset of relevant features is crucial to the analysis of high-dimensional datasets coming from a number of application domains, such as biomedical data, document and image analysis. Since no single selection algorithm seems to be capable of ensuring optimal results in terms of both predictive performance and stability (i.e. robustness to changes in the input data), researchers have increasingly explored the effectiveness of “ensemble” approaches involving the combination of different selectors. While interesting proposals have been reported in the literature, most of them have been so far evaluated in a limited number of settings (e.g. with data from a single domain and in conjunction with specific selection approaches), leaving unanswered important questions about the large-scale applicability and utility of ensemble feature selection. To give a contribution to the field, this work presents an empirical study which encompasses different kinds of selection algorithms (filters and embedded methods, univariate and multivariate techniques) and different application domains. Specifically, we consider 18 classification tasks with heterogeneous characteristics (in terms of number of classes and instances-to-features ratio) and experimentally evaluate, for feature subsets of different cardinalities, the extent to which an ensemble approach turns out to be more robust than a single selector, thus providing useful insight for both researchers and practitioners.

[1]  DessìNicoletta,et al.  Similarity of feature selection methods , 2015 .

[2]  Blaise Hanczar,et al.  Analysis of feature selection stability on high dimension and small sample data , 2014, Comput. Stat. Data Anal..

[3]  Hui-Juan Zhu,et al.  HEMD: a highly efficient random forest-based malware detection framework for Android , 2017, Neural Computing and Applications.

[4]  Jiebo Luo,et al.  Learning multi-label scene classification , 2004, Pattern Recognit..

[5]  Douglas W. Oard,et al.  Combining feature selectors for text classification , 2006, CIKM '06.

[6]  T. Golub,et al.  Gene expression-based classification of malignant gliomas correlates better with survival than histological classification. , 2003, Cancer research.

[7]  Robert C. Holte,et al.  Very Simple Classification Rules Perform Well on Most Commonly Used Datasets , 1993, Machine Learning.

[8]  E. Petricoin,et al.  Use of proteomic patterns in serum to identify ovarian cancer , 2002, The Lancet.

[9]  Emilio Corchado,et al.  A survey of multiple classifier systems as hybrid systems , 2014, Inf. Fusion.

[10]  Pablo A. Estévez,et al.  A review of feature selection methods based on mutual information , 2013, Neural Computing and Applications.

[11]  Yvan Saeys,et al.  Discriminative and informative features for biomolecular text mining with ensemble feature selection , 2010, Bioinform..

[12]  Mikhail F. Kanevski,et al.  Feature Selection for Regression Problems Based on the Morisita Estimator of Intrinsic Dimension: Concept and Case Studies , 2016, Pattern Recognit..

[13]  C. J. Satchwell Neural Computing Applications Forum: The Birth of a Society , 1993, Neural Comput. Appl..

[14]  Verónica Bolón-Canedo,et al.  A review of feature selection methods on synthetic data , 2013, Knowledge and Information Systems.

[15]  Frederic Maire,et al.  Intelligent Data Engineering and Automated Learning – IDEAL 2015 , 2015, Lecture Notes in Computer Science.

[16]  Taghi M. Khoshgoftaar,et al.  Mean Aggregation versus Robust Rank Aggregation for Ensemble Gene Selection , 2012, 2012 11th International Conference on Machine Learning and Applications.

[17]  Verónica Bolón-Canedo,et al.  On developing an automatic threshold applied to feature selection ensembles , 2018, Inf. Fusion.

[18]  Peter Bühlmann,et al.  Bagging, Boosting and Ensemble Methods , 2012 .

[19]  Steve R. Gunn,et al.  Result Analysis of the NIPS 2003 Feature Selection Challenge , 2004, NIPS.

[20]  Subhash C. Bagui,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2005, Technometrics.

[21]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[22]  K. Cios,et al.  Self-Organizing Feature Maps Identify Proteins Critical to Learning in a Mouse Model of Down Syndrome , 2015, PloS one.

[23]  Mohamed Limam,et al.  Ensemble feature selection for high dimensional data: a new method and a comparative study , 2017, Advances in Data Analysis and Classification.

[24]  Jana Novovicová,et al.  Evaluating Stability and Comparing Output of Feature Selectors that Optimize Feature Subset Cardinality , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Zengyou He,et al.  Stable Feature Selection for Biomarker Discovery , 2010, Comput. Biol. Chem..

[26]  Marko Robnik-Sikonja,et al.  Theoretical and Empirical Analysis of ReliefF and RReliefF , 2003, Machine Learning.

[27]  Verónica Bolón-Canedo,et al.  Ensemble feature selection: Homogeneous and heterogeneous approaches , 2017, Knowl. Based Syst..

[28]  Alexandros Kalousis,et al.  Model mining for robust feature selection , 2012, KDD.

[29]  Wilker Altidor,et al.  Ensemble Feature Ranking Methods for Data Intensive Computing Applications , 2011 .

[30]  Bruno Lacroix,et al.  Automatic identification of mixed bacterial species fingerprints in a MALDI-TOF mass-spectrum , 2014, Bioinform..

[31]  Ludmila I. Kuncheva,et al.  A stability index for feature selection , 2007, Artificial Intelligence and Applications.

[32]  Alain Rakotomamonjy,et al.  Variable Selection Using SVM-based Criteria , 2003, J. Mach. Learn. Res..

[33]  Stanislaw Osowski,et al.  Data mining for feature selection in gene expression autism data , 2015, Expert Syst. Appl..

[34]  Taghi M. Khoshgoftaar,et al.  A novel dataset-similarity-aware approach for evaluating stability of software metric selection techniques , 2012, 2012 IEEE 13th International Conference on Information Reuse & Integration (IRI).

[35]  Huan Liu,et al.  A Dilemma in Assessing Stability of Feature Selection Algorithms , 2011, 2011 IEEE International Conference on High Performance Computing and Communications.

[36]  Thibault Helleputte,et al.  Robust biomarker identification for cancer diagnosis with ensemble feature selection methods , 2010, Bioinform..

[37]  PesBarbara,et al.  Exploiting the ensemble paradigm for stable feature selection , 2017 .

[38]  Verónica Bolón-Canedo,et al.  Data classification using an ensemble of filters , 2014, Neurocomputing.

[39]  Nicoletta Dessì,et al.  Exploiting the ensemble paradigm for stable feature selection: A case study on high-dimensional genomic data , 2017, Inf. Fusion.

[40]  Nicoletta Dessì,et al.  Similarity of feature selection methods: An empirical study across data intensive classification tasks , 2015, Expert Syst. Appl..

[41]  Vipin Kumar,et al.  Feature Selection: A literature Review , 2014, Smart Comput. Rev..

[42]  Constantin F. Aliferis,et al.  A comprehensive comparison of random forests and support vector machines for microarray-based cancer classification , 2008, BMC Bioinformatics.

[43]  Lior Rokach,et al.  Decision forest: Twenty years of research , 2016, Inf. Fusion.

[44]  Alwyn Roshan Pais,et al.  Detection of phishing websites using an efficient feature-based machine learning framework , 2018, Neural Computing and Applications.

[45]  Brian Johnson,et al.  Classifying a high resolution image of an urban area using super-object information , 2013 .

[46]  Nicoletta Dessì,et al.  On Stability of Ensemble Gene Selection , 2015, IDEAL.

[47]  Nicoletta Dessì,et al.  A Filter-Based Evolutionary Approach for Selecting Features in High-Dimensional Micro-array Data , 2010, Intelligent Information Processing.

[48]  Daniel Pizarro-Perez,et al.  Computer-Aided Classification of Gastrointestinal Lesions in Regular Colonoscopy , 2016, IEEE Transactions on Medical Imaging.

[49]  Verónica Bolón-Canedo,et al.  Recent advances and emerging challenges of feature selection in the context of big data , 2015, Knowl. Based Syst..

[50]  Nicholas Kushmerick,et al.  Learning to remove Internet advertisements , 1999, AGENTS '99.

[51]  Lior Rokach,et al.  A Methodology for Improving the Performance of Non-Ranker Feature Selection Filters , 2007, Int. J. Pattern Recognit. Artif. Intell..

[52]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[53]  Elias Oliveira,et al.  Agglomeration and Elimination of Terms for Dimensionality Reduction , 2009, 2009 Ninth International Conference on Intelligent Systems Design and Applications.

[54]  Maysam F. Abbod,et al.  Classifiers consensus system approach for credit scoring , 2016, Knowl. Based Syst..

[55]  Nicoletta Dessì,et al.  Assessing similarity of feature selection techniques in high-dimensional domains , 2013, Pattern Recognit. Lett..

[56]  Yvan Saeys,et al.  Robust Feature Selection Using Ensemble Feature Selection Techniques , 2008, ECML/PKDD.

[57]  Taghi M. Khoshgoftaar,et al.  An extensive comparison of feature ranking aggregation techniques in bioinformatics , 2012, 2012 IEEE 13th International Conference on Information Reuse & Integration (IRI).

[58]  Donghai Guan,et al.  A Review of Ensemble Learning Based Feature Selection , 2014 .

[59]  Ludmila I. Kuncheva,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2004 .

[60]  Aixia Guo,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2014 .

[61]  Tom Fawcett,et al.  ROC Graphs: Notes and Practical Considerations for Researchers , 2007 .

[62]  Ludmila I. Kuncheva,et al.  Evaluation of Feature Ranking Ensembles for High-Dimensional Biomedical Data: A Case Study , 2012, 2012 IEEE 12th International Conference on Data Mining Workshops.

[63]  Dipti Patra,et al.  An ensemble classifier system for early diagnosis of acute lymphoblastic leukemia in blood microscopic images , 2013, Neural Computing and Applications.

[64]  Amri Napolitano,et al.  Software measurement data reduction using ensemble techniques , 2012, Neurocomputing.

[65]  Barbara Pes,et al.  Feature Selection for High-Dimensional Data: The Issue of Stability , 2017, WETICE.

[66]  Taghi M. Khoshgoftaar,et al.  A review of the stability of feature selection techniques for bioinformatics data , 2012, 2012 IEEE 13th International Conference on Information Reuse & Integration (IRI).

[67]  Matilde Santos,et al.  Neural networks ensemble for automatic DNA microarray spot classification , 2017, Neural Computing and Applications.

[68]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[69]  Samina Khalid,et al.  A survey of feature selection and feature extraction techniques in machine learning , 2014, 2014 Science and Information Conference.

[70]  Siddhartha Bhattacharyya,et al.  Data mining for credit card fraud: A comparative study , 2011, Decis. Support Syst..

[71]  Huan Liu,et al.  Feature selection for classification: A review , 2014 .

[72]  Swarun Kumar,et al.  LTE radio analytics made easy and accessible , 2015, SIGCOMM 2015.

[73]  Max A. Little,et al.  Objective Automatic Assessment of Rehabilitative Speech Treatment in Parkinson's Disease , 2014, IEEE Transactions on Neural Systems and Rehabilitation Engineering.

[74]  Feng Yang,et al.  Robust Feature Selection for Microarray Data Based on Multicriterion Fusion , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[75]  Jean-Philippe Vert,et al.  The Influence of Feature Selection Methods on Accuracy, Stability and Interpretability of Molecular Signatures , 2011, PloS one.

[76]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .