uEFS: An efficient and comprehensive ensemble-based feature selection methodology to select informative features

Feature selection is considered to be one of the most critical methods for choosing appropriate features from a larger set of items. This task requires two basic steps: ranking and filtering. Of these, the former necessitates the ranking of all features, while the latter involves filtering out all irrelevant features based on some threshold value. In this regard, several feature selection methods with well-documented capabilities and limitations have already been proposed. Similarly, feature ranking is also nontrivial, as it requires the designation of an optimal cutoff value so as to properly select important features from a list of candidate features. However, the availability of a comprehensive feature ranking and a filtering approach, which alleviates the existing limitations and provides an efficient mechanism for achieving optimal results, is a major problem. Keeping in view these facts, we present an efficient and comprehensive univariate ensemble-based feature selection (uEFS) methodology to select informative features from an input dataset. For the uEFS methodology, we first propose a unified features scoring (UFS) algorithm to generate a final ranked list of features following a comprehensive evaluation of a feature set. For defining cutoff points to remove irrelevant features, we subsequently present a threshold value selection (TVS) algorithm to select a subset of features that are deemed important for the classifier construction. The uEFS methodology is evaluated using standard benchmark datasets. The extensive experimental results show that our proposed uEFS methodology provides competitive accuracy and achieved (1) on average around a 7% increase in f-measure, and (2) on average around a 5% increase in predictive accuracy as compared with state-of-the-art methods.

[1]  Ruxandra Stoean,et al.  A Survey on Feature Ranking by Means of Evolutionary Computation , 2013 .

[2]  Ian H. Witten,et al.  Knowledge Visualization Techniques for Machine Learning , 1998, Intell. Data Anal..

[3]  Perica Strbac,et al.  Toward optimal feature selection using ranking methods and classification algorithms , 2011 .

[4]  Geoffrey J. McLachlan,et al.  Analyzing Microarray Gene Expression Data , 2004 .

[5]  Sohail Asghar,et al.  A REVIEW OF FEATURE SELECTION TECHNIQUES IN STRUCTURE LEARNING , 2013 .

[6]  Larry A. Rendell,et al.  A Practical Approach to Feature Selection , 1992, ML.

[7]  Andries Petrus Engelbrecht,et al.  A decision rule-based method for feature selection in predictive data mining , 2010, Expert Syst. Appl..

[8]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[9]  Hamid Beigy,et al.  A New Ensemble Method for Feature Ranking in Text Mining , 2013, Int. J. Artif. Intell. Tools.

[10]  Thibault Helleputte,et al.  Robust biomarker identification for cancer diagnosis with ensemble feature selection methods , 2010, Bioinform..

[11]  Shubhamoy Dey,et al.  Performance Investigation of Feature Selection Methods and Sentiment Lexicons for Sentiment Analysis , 2012 .

[12]  L. A. Belanche,et al.  Review and Evaluation of Feature Selection Algorithms in Synthetic Problems , 2011, 1101.2320.

[13]  Shikha Agrawal,et al.  A Survey on Feature Selection Techniques for Internet Traffic Classification , 2015, 2015 International Conference on Computational Intelligence and Communication Networks (CICN).

[14]  Kun Liu,et al.  Study on SVM Compared with the other Text Classification Methods , 2010, 2010 Second International Workshop on Education Technology and Computer Science.

[15]  Ronaldo C. Prati,et al.  Combining feature ranking algorithms through rank aggregation , 2012, The 2012 International Joint Conference on Neural Networks (IJCNN).

[16]  Anne M. P. Canuto,et al.  A Comparative Analysis of Feature Selection Methods for Ensembles with Different Combination Methods , 2007, 2007 International Joint Conference on Neural Networks.

[17]  Li Guo,et al.  Survey and Taxonomy of Feature Selection Algorithms in Intrusion Detection System , 2006, Inscrypt.

[18]  Verónica Bolón-Canedo,et al.  A review of microarray datasets and applied feature selection methods , 2014, Inf. Sci..

[19]  Huan Liu,et al.  Toward integrating feature selection algorithms for classification and clustering , 2005, IEEE Transactions on Knowledge and Data Engineering.

[20]  Risto Miikkulainen,et al.  Automatic feature selection in neuroevolution , 2005, GECCO '05.

[21]  Aytug Onan,et al.  A feature selection model based on genetic rank aggregation for text sentiment classification , 2017, J. Inf. Sci..

[22]  Sungyoung Lee,et al.  A Data-Driven Knowledge Acquisition System: An End-to-End Knowledge Engineering Process for Generating Production Rules , 2018, IEEE Access.

[23]  Sanmay Das,et al.  Filters, Wrappers and a Boosting-Based Hybrid for Feature Selection , 2001, ICML.

[24]  Nur Izura Udzir,et al.  A Study on Feature Selection and Classification Techniques for Automatic Genre Classification of Traditional Malay Music , 2008, ISMIR.

[25]  Lipika Dey,et al.  A feature selection technique for classificatory analysis , 2005, Pattern Recognit. Lett..

[26]  Roliana Ibrahim,et al.  Ordinal-based and frequency-based integration of feature selection methods for sentiment analysis , 2017, Expert Syst. Appl..

[27]  Jaideep Srivastava,et al.  Robust Feature Selection Technique Using Rank Aggregation , 2014, Appl. Artif. Intell..

[28]  Yuming Zhou,et al.  A Feature Subset Selection Algorithm Automatic Recommendation Method , 2013, J. Artif. Intell. Res..

[29]  Mohammed Attik Using Ensemble Feature Selection Approach in Selecting Subset with Relevant Features , 2006, ISNN.

[30]  Yvan Saeys,et al.  Robust Feature Selection Using Ensemble Feature Selection Techniques , 2008, ECML/PKDD.

[31]  Masoud Makrehchi Feature Ranking for Text Classifiers , 2007 .

[32]  Marko Robnik-Sikonja,et al.  Overcoming the Myopia of Inductive Learning Algorithms with RELIEFF , 2004, Applied Intelligence.

[33]  Ali Dehghantanha,et al.  Ensemble-based multi-filter feature selection method for DDoS detection in cloud computing , 2016, EURASIP Journal on Wireless Communications and Networking.

[34]  S. I. Ali,et al.  A feature subset selection method based on symmetric uncertainty and Ant Colony Optimization , 2012, 2012 International Conference on Emerging Technologies.

[35]  Elena Marchiori,et al.  Ensemble Feature Ranking , 2004, PKDD.

[36]  Bernard Zenko,et al.  Evaluation Method for Feature Rankings and their Aggregations for Biomarker Discovery , 2009, MLSB.

[37]  Verónica Bolón-Canedo,et al.  Ensemble feature selection: Homogeneous and heterogeneous approaches , 2017, Knowl. Based Syst..

[38]  Lior Rokach,et al.  Feature Selection by Combining Multiple Methods , 2006, Advances in Web Intelligence and Data Mining.

[39]  Maqbool Ali,et al.  Data Analysis, Discharge Classifications, and Predictions of Hydrological Parameters for the Management of Rawal Dam in Pakistan , 2013, 2013 12th International Conference on Machine Learning and Applications.

[40]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[41]  Melanie Hilario,et al.  Knowledge and Information Systems , 2007 .

[42]  Wilker Altidor,et al.  Ensemble Feature Ranking Methods for Data Intensive Computing Applications , 2011 .

[43]  Eugene Tuv,et al.  Feature Selection Using Ensemble Based Ranking Against Artificial Contrasts , 2006, The 2006 IEEE International Joint Conference on Neural Network Proceedings.

[44]  Wilker Altidor Stability analysis of feature selection approaches with low quality data , 2011 .