Feature construction as a bi-level optimization problem

Feature selection and construction are important pre-processing techniques in data mining. They allow not only dimensionality reduction but also classification accuracy and efficiency improvement. While feature selection consists in selecting a subset of relevant features from the original feature set, feature construction corresponds to the generation of new high-level features, called constructed features, where each one of them is a combination of a subset of original features. However, different features can have different abilities to distinguish different classes. Therefore, it may be more difficult to construct a better discriminating feature when combining features that are relevant to different classes. Based on these definitions, feature construction could be seen as a BLOP (Bi-Level optimization Problem) where the feature subset should be defined in the upper level and the feature construction is applied in the lower level by performing mutliple followers, each of which generates a set class dependent constructed features. In this paper, we propose a new bi-level evolutionary approach for feature construction called BCDFC that constructs multiple features which focuses on distinguishing one class from other classes using Genetic Programming (GP). A detailed experimental study has been conducted on six high-dimensional datasets. The statistical analysis of the obtained results shows the competitiveness and the outperformance of our bi-level feature construction approach with respect to many state-of-art algorithms.

[1]  Xin Yao,et al.  A Survey on Evolutionary Computation Approaches to Feature Selection , 2016, IEEE Transactions on Evolutionary Computation.

[2]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[3]  Zexuan Zhu,et al.  Wrapper–Filter Feature Selection Algorithm Using a Memetic Framework , 2007, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[4]  Mengjie Zhang,et al.  Fitness Functions in Genetic Programming for Classification with Unbalanced Data , 2007, Australian Conference on Artificial Intelligence.

[5]  Mengjie Zhang,et al.  Particle Swarm Optimization for Feature Selection in Classification: A Multi-Objective Approach , 2013, IEEE Transactions on Cybernetics.

[6]  Mengjie Zhang,et al.  Binary particle swarm optimisation for feature selection: A filter based approach , 2012, 2012 IEEE Congress on Evolutionary Computation.

[7]  Xue Bing,et al.  Multiple feature construction in classification on high-dimensional data using GP , 2016 .

[8]  Patrice Marcotte,et al.  An overview of bilevel optimization , 2007, Ann. Oper. Res..

[9]  Francisco Herrera,et al.  A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm intelligence algorithms , 2011, Swarm Evol. Comput..

[10]  Mengjie Zhang,et al.  Genetic programming for feature construction and selection in classification on high-dimensional data , 2016, Memetic Comput..

[11]  Diego Cabrera,et al.  Hierarchical feature selection based on relative dependency for gear fault diagnosis , 2015, Applied Intelligence.

[12]  Kalyanmoy Deb,et al.  Code-Smell Detection as a Bilevel Problem , 2014, TSEM.

[13]  Jasbir S. Arora,et al.  Introduction to Optimum Design , 1988 .

[14]  Chris H. Q. Ding,et al.  Minimum redundancy feature selection from microarray gene expression data , 2003, Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003.

[15]  Abir Chaabani,et al.  A new co-evolutionary decomposition-based algorithm for bi-level combinatorial optimization , 2015, Applied Intelligence.

[16]  Feng Chu,et al.  A General Wrapper Approach to Selection of Class-Dependent Features , 2008, IEEE Transactions on Neural Networks.

[17]  Mengjie Zhang,et al.  A PSO based hybrid feature selection algorithm for high-dimensional classification , 2016, 2016 IEEE Congress on Evolutionary Computation (CEC).

[18]  Hiroshi Motoda,et al.  Feature Extraction, Construction and Selection: A Data Mining Perspective , 1998 .

[19]  Chih-Cheng Hung,et al.  A Multi-objective hybrid filter-wrapper evolutionary approach for feature selection , 2018, Memetic Comput..

[20]  Chih-Cheng Hung,et al.  A Multi-Objective Hybrid Filter-Wrapper Evolutionary Approach for Feature Construction on High-Dimensional Data , 2018, 2018 IEEE Congress on Evolutionary Computation (CEC).

[21]  Constantin F. Aliferis,et al.  A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis , 2004, Bioinform..

[22]  Mengjie Zhang,et al.  Genetic programming for multiple-feature construction on high-dimensional classification , 2019, Pattern Recognit..

[23]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[24]  K. De Jong,et al.  Effective Automated Feature Construction and Selection for Classification of Biological Sequences , 2014, PloS one.

[25]  Mengjie Zhang,et al.  Class Dependent Multiple Feature Construction Using Genetic Programming for High-Dimensional Data , 2017, Australasian Conference on Artificial Intelligence.

[26]  Khaled Ghédira,et al.  Negotiating decision makers' reference points for group preference-based Evolutionary Multi-objective Optimization , 2011, 2011 11th International Conference on Hybrid Intelligent Systems (HIS).

[27]  Leslie S. Smith,et al.  Feature subset selection in large dimensionality domains , 2010, Pattern Recognit..

[28]  Lamjed Ben Said,et al.  Weighted-Features Construction as a Bi-level Problem , 2019, 2019 IEEE Congress on Evolutionary Computation (CEC).