Feature construction as a bi-level optimization problem

Feature selection and construction are important preprocessing techniques in data mining. They allow not only dimensionality reduction but also classification accuracy and efficiency improvement. While feature selection consists in selecting a subset of relevant feature from the original feature set, feature construction corresponds to the generation of new high-level features, called constructed features, where each one of them is a combination of a subset of original features. Based on these definitions, feature construction could be seen as a bi-level optimization problem where the feature subset should be defined first and then the corresponding (near) optimal combination of the selected features should be found. Motivated by this observation, we propose, in this paper, a bi-level evolutionary approach for feature construction. The basic idea of our algorithm, named bi-level feature construction genetic algorithm (BFC-GA), is to evolve an upper-level population for the task of feature selection, while optimizing the feature combinations at the lower level by evolving a follower population. It is worth noting that for each upper-level individual (feature subset), a whole lower-level population is optimized to find the corresponding (near) optimal feature combination (constructed feature). In this way, BFC-GA would be able to output a set of optimized constructed features that could be very informative to the considered classifier. A detailed experimental study has been conducted on a set of commonly used datasets with varying dimensions. The statistical analysis of the obtained results shows the competitiveness and the outperformance of our bi-level feature construction approach with respect to many state-of-the-art algorithms.

[1]  Xin Yao,et al.  A Survey on Evolutionary Computation Approaches to Feature Selection , 2016, IEEE Transactions on Evolutionary Computation.

[2]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[3]  Chris H. Q. Ding,et al.  Minimum redundancy feature selection from microarray gene expression data , 2003, Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003.

[4]  Pablo A. Estévez,et al.  A review of feature selection methods based on mutual information , 2013, Neural Computing and Applications.

[5]  Lixin Ding,et al.  Unsupervised feature selection based on decision graph , 2017, Neural Computing and Applications.

[6]  C E Shannon,et al.  The mathematical theory of communication. 1963. , 1997, M.D. computing : computers in medical practice.

[7]  Evolutionary Constructive Induction , 2017, Encyclopedia of Machine Learning and Data Mining.

[8]  Constantin F. Aliferis,et al.  A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis , 2004, Bioinform..

[9]  Jessica Andrea Carballido,et al.  Discretization of gene expression data revised , 2016, Briefings Bioinform..

[10]  Jinling Liang,et al.  Multistability of complex-valued neural networks with distributed delays , 2016, Neural Computing and Applications.

[11]  Leslie S. Smith,et al.  Feature subset selection in large dimensionality domains , 2010, Pattern Recognit..

[12]  Chih-Cheng Hung,et al.  A Multi-Objective Hybrid Filter-Wrapper Evolutionary Approach for Feature Construction on High-Dimensional Data , 2018, 2018 IEEE Congress on Evolutionary Computation (CEC).

[13]  Diego Cabrera,et al.  Hierarchical feature selection based on relative dependency for gear fault diagnosis , 2015, Applied Intelligence.

[14]  Mengjie Zhang,et al.  Multiple feature construction in classification on high-dimensional data using GP , 2016, 2016 IEEE Symposium Series on Computational Intelligence (SSCI).

[15]  Guy N. Brock,et al.  clValid , an R package for cluster validation , 2008 .

[16]  Abir Chaabani,et al.  A new co-evolutionary decomposition-based algorithm for bi-level combinatorial optimization , 2015, Applied Intelligence.

[17]  Sergio Ramírez-Gallego,et al.  Evolutionary Feature Selection for Big Data Classification: A MapReduce Approach , 2015 .

[18]  Jing Hu,et al.  Bilevel Optimization and Machine Learning , 2008, WCCI.

[19]  Mengjie Zhang,et al.  Genetic programming for multiple-feature construction on high-dimensional classification , 2019, Pattern Recognit..

[20]  Kalyanmoy Deb,et al.  Code-Smell Detection as a Bilevel Problem , 2014, TSEM.

[21]  Mengjie Zhang,et al.  Genetic programming for feature construction and selection in classification on high-dimensional data , 2016, Memetic Comput..

[22]  Chih-Cheng Hung,et al.  A Multi-objective hybrid filter-wrapper evolutionary approach for feature selection , 2018, Memetic Comput..

[23]  Jasbir S. Arora,et al.  Introduction to Optimum Design , 1988 .

[24]  Hiroshi Motoda,et al.  Feature Extraction, Construction and Selection: A Data Mining Perspective , 1998 .

[25]  Mengjie Zhang,et al.  Particle Swarm Optimization for Feature Selection in Classification: A Multi-Objective Approach , 2013, IEEE Transactions on Cybernetics.

[26]  Francisco Herrera,et al.  A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm intelligence algorithms , 2011, Swarm Evol. Comput..

[27]  Anne M. P. Canuto,et al.  A genetic-based approach to features selection for ensembles using a hybrid and adaptive fitness function , 2012, The 2012 International Joint Conference on Neural Networks (IJCNN).

[28]  Madhan Shridhar Phadke,et al.  Quality Engineering Using Robust Design , 1989 .

[29]  Phillip J. Ross,et al.  Taguchi Techniques For Quality Engineering: Loss Function, Orthogonal Experiments, Parameter And Tolerance Design , 1988 .

[30]  Mengjie Zhang,et al.  Particle swarm optimisation for feature selection: A hybrid filter-wrapper approach , 2015, 2015 IEEE Congress on Evolutionary Computation (CEC).

[31]  K. De Jong,et al.  Effective Automated Feature Construction and Selection for Classification of Biological Sequences , 2014, PloS one.

[32]  Jose Miguel Puerta,et al.  Fast wrapper feature subset selection in high-dimensional datasets by means of filter re-ranking , 2012, Knowl. Based Syst..

[33]  C. E. SHANNON,et al.  A mathematical theory of communication , 1948, MOCO.

[34]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[35]  Patrice Marcotte,et al.  An overview of bilevel optimization , 2007, Ann. Oper. Res..

[36]  Ram Sarkar,et al.  A wrapper-filter feature selection technique based on ant colony optimization , 2019, Neural Computing and Applications.

[37]  Mengjie Zhang,et al.  A new GP-based wrapper feature construction approach to classification and biomarker identification , 2014, 2014 IEEE Congress on Evolutionary Computation (CEC).

[38]  Barbara Pes Ensemble feature selection for high-dimensional data: a stability analysis across multiple domains , 2019, Neural Computing and Applications.

[39]  Marco Torchiano,et al.  Assessing the Effect of Screen Mockups on the Comprehension of Functional Requirements , 2014, TSEM.

[40]  Mengjie Zhang,et al.  A Filter Approach to Multiple Feature Construction for Symbolic Learning Classifiers Using Genetic Programming , 2012, IEEE Transactions on Evolutionary Computation.

[41]  A. E. Eiben,et al.  Parameter tuning for configuring and analyzing evolutionary algorithms , 2011, Swarm Evol. Comput..

[42]  Zexuan Zhu,et al.  Wrapper–Filter Feature Selection Algorithm Using a Memetic Framework , 2007, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[43]  Mengjie Zhang,et al.  Fitness Functions in Genetic Programming for Classification with Unbalanced Data , 2007, Australian Conference on Artificial Intelligence.