A Novel Outlook on Feature Selection as a Multi-objective Problem

Feature selection is the process of choosing, or removing, features to obtain the most informative feature subset of minimal size. Such subsets are used to improve performance of machine learning algorithms and enable human understanding of the results. Approaches to feature selection in literature exploit several optimization algorithms. Multi-objective methods also have been proposed, minimizing at the same time the number of features and the error. While most approaches assess error resorting to the average of a stochastic K-fold cross-validation, comparing averages might be misleading. In this paper, we show how feature subsets with different average error might in fact be non-separable when compared using a statistical test. Following this idea, clusters of non-separable optimal feature subsets are identified. The performance in feature selection can thus be evaluated by verifying how many of these optimal feature subsets an algorithm is able to identify. We thus propose a multi-objective optimization approach to feature selection, EvoFS, with the objectives to i. minimize feature subset size, ii. minimize test error on a 10-fold cross-validation using a specific classifier, iii. maximize the analysis of variance value of the lowest-performing feature in the set. Experiments on classification datasets whose feature subsets can be exhaustively evaluated show that our approach is able to always find the best feature subsets. Further experiments on a high-dimensional classification dataset, that cannot be exhaustively analyzed, show that our approach is able to find more optimal feature subsets than state-of-the-art feature selection algorithms.

[1]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[2]  Steve R. Gunn,et al.  Result Analysis of the NIPS 2003 Feature Selection Challenge , 2004, NIPS.

[3]  Luís Torgo,et al.  OpenML: networked science in machine learning , 2014, SKDD.

[4]  Steve B. Jiang,et al.  Multi-Objective-Based Radiomic Feature Selection for Lesion Malignancy Classification , 2018, IEEE Journal of Biomedical and Health Informatics.

[5]  Claudio De Stefano,et al.  Variable-Length Representation for EC-Based Feature Selection in High-Dimensional Data , 2019, EvoApplications.

[6]  Fakhri Karray,et al.  Multi-objective Feature Selection with NSGA II , 2007, ICANNGA.

[7]  D. Cox The Regression Analysis of Binary Sequences , 1958 .

[8]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[9]  Jacob Scharcanski,et al.  Feature selection for face recognition based on multi-objective evolutionary wrappers , 2013, Expert Syst. Appl..

[10]  F. Agakov,et al.  Application of high-dimensional feature selection: evaluation for genomic prediction in man , 2015, Scientific Reports.

[11]  Philip M. Lewis,et al.  The characteristic selection problem in recognition systems , 1962, IRE Trans. Inf. Theory.

[12]  King-Sun Fu,et al.  On the generalized Karhunen-Loeve expansion (Corresp.) , 1967, IEEE Trans. Inf. Theory.

[13]  Sayan Mukherjee,et al.  Feature Selection for SVMs , 2000, NIPS.

[14]  R. Fisher XV.—The Correlation between Relatives on the Supposition of Mendelian Inheritance. , 1919, Transactions of the Royal Society of Edinburgh.

[15]  Gary W. Heiman Understanding Research Methods and Statistics: An Integrated Introduction for Psychology , 1997 .

[16]  Bernd Bischl,et al.  OpenML: An R package to connect to the machine learning platform OpenML , 2017, Comput. Stat..

[17]  Welch Bl THE GENERALIZATION OF ‘STUDENT'S’ PROBLEM WHEN SEVERAL DIFFERENT POPULATION VARLANCES ARE INVOLVED , 1947 .

[18]  Flora S. Tsai Dimensionality reduction for computer facial animation , 2012, Expert Syst. Appl..

[19]  Naomi S. Altman,et al.  Points of significance: Comparing samples—part I , 2014, Nature Methods.

[20]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[21]  Kalyanmoy Deb,et al.  A fast and elitist multiobjective genetic algorithm: NSGA-II , 2002, IEEE Trans. Evol. Comput..

[22]  Jan Paul Siebert,et al.  Vehicle Recognition Using Rule Based Methods , 1987 .

[23]  Mengjie Zhang,et al.  Multi-objective Feature Selection in Classification: A Differential Evolution Approach , 2014, SEAL.

[24]  Xin Yao,et al.  A Survey on Evolutionary Computation Approaches to Feature Selection , 2016, IEEE Transactions on Evolutionary Computation.

[25]  J. Ross Quinlan,et al.  Simplifying Decision Trees , 1987, Int. J. Man Mach. Stud..