Robust multiobjective evolutionary feature subset selection algorithm for binary classification using machine learning techniques

This study investigates the success of a multiobjective genetic algorithm (GA) combined with state-of-the-art machine learning (ML) techniques for the feature subset selection (FSS) in binary classification problem (BCP). Recent studies have focused on improving the accuracy of BCP by including all of the features, neglecting to determine the best performing subset of features. However, for some problems, the number of features may reach thousands, which will cause too much computation power to be consumed during the feature evaluation and classification phases, also possibly reducing the accuracy of the results. Therefore, selecting the minimum number of features while preserving and/or increasing the accuracy of the results at a high level becomes an important issue for achieving fast and accurate binary classification. Our multiobjective evolutionary algorithm includes two phases, FSS using a GA and applying ML techniques for the BCP. Since exhaustively investigating all of the feature subsets is intractable, a GA is preferred for the first phase of the algorithm for intelligently detecting the most appropriate feature subset. The GA uses multiobjective crossover and mutation operators to improve a population of individuals (each representing a selected feature subset) and obtain (near-) optimal solutions through generations. In the second phase of the algorithms, the fitness of the selected subset is decided by using state-of-the-art ML techniques; Logistic Regression, Support Vector Machines, Extreme Learning Machine, K-means, and Affinity Propagation. The performance of the multiobjective evolutionary algorithm (and the ML techniques) is evaluated with comprehensive experiments and compared with state-of-the-art algorithms, Greedy Search, Particle Swarm Optimization, Tabu Search, and Scatter Search. The proposed algorithm was observed to be robust and it performed better than the existing methods on most of the datasets.

[1]  Mengjie Zhang,et al.  Binary particle swarm optimisation for feature selection: A filter based approach , 2012, 2012 IEEE Congress on Evolutionary Computation.

[2]  Ron Kohavi,et al.  Irrelevant Features and the Subset Selection Problem , 1994, ICML.

[3]  Jihoon Yang,et al.  Feature Subset Selection Using a Genetic Algorithm , 1998, IEEE Intell. Syst..

[4]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[5]  Hiroshi Motoda,et al.  Feature Selection for Knowledge Discovery and Data Mining , 1998, The Springer International Series in Engineering and Computer Science.

[6]  Yin-Fu Huang,et al.  Evolutionary-based feature selection approaches with new criteria for data mining: A case study of credit approval data , 2009, Expert Syst. Appl..

[7]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[8]  Hassan B. Kazemian,et al.  Comparisons of machine learning techniques for detecting malicious webpages , 2015, Expert Syst. Appl..

[9]  Asif Ekbal,et al.  MODE: multiobjective differential evolution for feature selection and classifier ensemble , 2015, Soft Computing.

[10]  Belén Melián-Batista,et al.  Solving feature subset selection problem by a Parallel Scatter Search , 2006, Eur. J. Oper. Res..

[11]  Ivan Koychev,et al.  Learning about Users from Observation , 2000 .

[12]  El-Ghazali Talbi,et al.  Comparison of population based metaheuristics for feature selection: Application to microarray data classification , 2008, 2008 IEEE/ACS International Conference on Computer Systems and Applications.

[13]  Huan Liu,et al.  Customer Retention via Data Mining , 2000, Artificial Intelligence Review.

[14]  Steven Salzberg,et al.  On Comparing Classifiers: Pitfalls to Avoid and a Recommended Approach , 1997, Data Mining and Knowledge Discovery.

[15]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[16]  Huan Liu,et al.  Toward integrating feature selection algorithms for classification and clustering , 2005, IEEE Transactions on Knowledge and Data Engineering.

[17]  Hossein Nezamabadi-pour,et al.  Facing the classification of binary problems with a GSA-SVM hybrid system , 2013, Math. Comput. Model..

[18]  A. R. Baig,et al.  Multi-Objective Feature Subset Selection using Non-dominated Sorting Genetic Algorithm , 2015 .

[19]  Brendan J. Frey,et al.  Non-metric affinity propagation for unsupervised image categorization , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[20]  Fakhri Karray,et al.  Multi-objective Feature Selection with NSGA II , 2007, ICANNGA.

[21]  Chun Chen,et al.  Group sparse feature selection on local learning based clustering , 2016, Neurocomputing.

[22]  Chih-Jen Lin,et al.  A Practical Guide to Support Vector Classication , 2008 .

[23]  Melody Y. Kiang,et al.  A comparative assessment of classification methods , 2003, Decis. Support Syst..

[24]  Paul S. Bradley,et al.  Refining Initial Points for K-Means Clustering , 1998, ICML.

[25]  Alper Ekrem Murat,et al.  A discrete particle swarm optimization method for feature selection in binary classification problems , 2010, Eur. J. Oper. Res..

[26]  Dianhui Wang,et al.  Extreme learning machines: a survey , 2011, Int. J. Mach. Learn. Cybern..

[27]  Erik Brynjolfsson,et al.  Big data: the management revolution. , 2012, Harvard business review.

[28]  Silvia Casado Yusta,et al.  Different metaheuristic strategies to solve the feature selection problem , 2009, Pattern Recognit. Lett..

[29]  Rong Jin,et al.  Online feature selection for mining big data , 2012, BigMine '12.

[30]  Michael I. Jordan,et al.  Feature selection for high-dimensional genomic microarray data , 2001, ICML.

[31]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[32]  Ferat Sahin,et al.  A survey on feature selection methods , 2014, Comput. Electr. Eng..

[33]  Xin Yao,et al.  A Survey on Evolutionary Computation Approaches to Feature Selection , 2016, IEEE Transactions on Evolutionary Computation.

[34]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[35]  Huan Liu,et al.  Feature Selection for Classification , 1997, Intell. Data Anal..

[36]  Joaquín A. Pacheco,et al.  A variable selection method based on Tabu search for logistic regression models , 2009, Eur. J. Oper. Res..

[37]  Marti A. Hearst Trends & Controversies: Support Vector Machines , 1998, IEEE Intell. Syst..

[38]  Ethem Alpaydin,et al.  Introduction to machine learning , 2004, Adaptive computation and machine learning.

[39]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[40]  Michael J. A. Berry,et al.  Data mining techniques - for marketing, sales, and customer support , 1997, Wiley computer publishing.

[41]  Jian Jhen Chen,et al.  K-means clustering versus validation measures: a data-distribution perspective. , 2009, IEEE transactions on systems, man, and cybernetics. Part B, Cybernetics : a publication of the IEEE Systems, Man, and Cybernetics Society.

[42]  Cheng-Lung Huang,et al.  A GA-based feature selection and parameters optimizationfor support vector machines , 2006, Expert Syst. Appl..

[43]  Keinosuke Fukunaga,et al.  A Branch and Bound Algorithm for Feature Subset Selection , 1977, IEEE Transactions on Computers.

[44]  Kalyanmoy Deb,et al.  A fast and elitist multiobjective genetic algorithm: NSGA-II , 2002, IEEE Trans. Evol. Comput..

[45]  Sushmita Mitra,et al.  Multi-objective optimization of shared nearest neighbor similarity for feature selection , 2015, Appl. Soft Comput..

[46]  Mengjie Zhang,et al.  Particle Swarm Optimization for Feature Selection in Classification: A Multi-Objective Approach , 2013, IEEE Transactions on Cybernetics.

[47]  Charles A. O'Reilly,et al.  Variations in Decision Makers' Use of Information Sources: The Impact of Quality and Accessibility of Information , 1982 .

[48]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[49]  Pedro Larrañaga,et al.  Feature Subset Selection by Bayesian network-based optimization , 2000, Artif. Intell..

[50]  Andreas Christmann,et al.  Support vector machines , 2008, Data Mining and Knowledge Discovery Handbook.

[51]  Jian Zhang,et al.  Semi-supervised feature selection based on local discriminative information , 2016, Neurocomputing.

[52]  Pat Langley,et al.  Selection of Relevant Features and Examples in Machine Learning , 1997, Artif. Intell..