A Multi-Objective Hybrid Filter-Wrapper Evolutionary Approach for Feature Construction on High-Dimensional Data

Feature selection and construction are important pre-processing techniques in data mining. They may allow not only dimensionality reduction but also classifier accuracy and efficiency improvement. These two techniques are of great importance especially for the case of high-dimensional data. Feature construction for high-dimensional data is still a very challenging topic. This can be explained by the large search space of feature combinations, whose size is a function of the number of features. Recently, researchers have used Genetic Programming (GP) for feature construction and the obtained results were promising. Unfortunately, the wrapper evaluation of each feature subset, where a feature can be constructed by a combination of features, is computationally intensive since such evaluation requires running the classifier on the data sets. Motivated by this observation, we propose, in this paper, a hybrid multiobjective evolutionary approach for efficient feature construction and selection. Our approach uses two filter objectives and one wrapper objective corresponding to the accuracy. In fact, the whole population is evaluated using two filter objectives. However, only non-dominated (best) feature subsets are improved using an indicator-based local search that optimizes the three objectives simultaneously. Our approach has been assessed on six high-dimensional datasets and compared with two existing prominent GP approaches, using three different classifiers for accuracy evaluation. Based on the obtained results, our approach is shown to provide competitive and better results compared with two competitor GP algorithms tested in this study.

[1]  John R. Koza,et al.  Introduction to genetic programming: tutorial , 2008, GECCO '08.

[2]  Kalyanmoy Deb,et al.  A fast and elitist multiobjective genetic algorithm: NSGA-II , 2002, IEEE Trans. Evol. Comput..

[3]  Lamjed Ben Said,et al.  Steady state IBEA assisted by MLP neural networks for expensive multi-objective optimization problems , 2014, GECCO.

[4]  Mengjie Zhang,et al.  Feature Selection and Classification of High Dimensional Mass Spectrometry Data: A Genetic Programming Approach , 2013, EvoBIO.

[5]  Mengjie Zhang,et al.  Genetic programming for feature construction and selection in classification on high-dimensional data , 2016, Memetic Comput..

[6]  Mengjie Zhang,et al.  Particle swarm optimisation for feature selection: A hybrid filter-wrapper approach , 2015, 2015 IEEE Congress on Evolutionary Computation (CEC).

[7]  Ferat Sahin,et al.  A survey on feature selection methods , 2014, Comput. Electr. Eng..

[8]  Xin Yao,et al.  A Survey on Evolutionary Computation Approaches to Feature Selection , 2016, IEEE Transactions on Evolutionary Computation.

[9]  Peter Nordin,et al.  Genetic programming - An Introduction: On the Automatic Evolution of Computer Programs and Its Applications , 1998 .

[10]  Peter A. Whigham,et al.  Grammar-based Genetic Programming: a survey , 2010, Genetic Programming and Evolvable Machines.

[11]  Mengjie Zhang,et al.  A Filter Approach to Multiple Feature Construction for Symbolic Learning Classifiers Using Genetic Programming , 2012, IEEE Transactions on Evolutionary Computation.

[12]  Krzysztof Krawiec,et al.  Genetic Programming-based Construction of Features for Machine Learning and Knowledge Discovery Tasks , 2002, Genetic Programming and Evolvable Machines.

[13]  Mengjie Zhang,et al.  Fitness Functions in Genetic Programming for Classification with Unbalanced Data , 2007, Australian Conference on Artificial Intelligence.

[14]  George D. Smith,et al.  Evolutionary constructive induction , 2005, IEEE Transactions on Knowledge and Data Engineering.

[15]  Krzysztof Krawiec,et al.  Coevolutionary Construction of Features for Transformation of Representation in Machine Learning , 2002 .

[16]  Leslie S. Smith,et al.  Feature subset selection in large dimensionality domains , 2010, Pattern Recognit..

[17]  Mengjie Zhang,et al.  Genetic Programming for Feature Subset Ranking in Binary Classification Problems , 2009, EuroGP.

[18]  Francisco Herrera,et al.  A Survey on the Application of Genetic Programming to Classification , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[19]  Constantin F. Aliferis,et al.  A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis , 2004, Bioinform..

[20]  Hiroshi Motoda,et al.  Feature Extraction, Construction and Selection: A Data Mining Perspective , 1998 .

[21]  Mengjie Zhang,et al.  A new GP-based wrapper feature construction approach to classification and biomarker identification , 2014, 2014 IEEE Congress on Evolutionary Computation (CEC).

[22]  Xue Bing,et al.  Multiple feature construction in classification on high-dimensional data using GP , 2016 .

[23]  Edmund K. Burke,et al.  Indicator-based multi-objective local search , 2007, 2007 IEEE Congress on Evolutionary Computation.

[24]  C. E. SHANNON,et al.  A mathematical theory of communication , 1948, MOCO.

[25]  Julian Francis Miller,et al.  Cartesian genetic programming , 2010, GECCO.

[26]  Chris H. Q. Ding,et al.  Minimum redundancy feature selection from microarray gene expression data , 2003, Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003.