A filter-based feature construction and feature selection approach for classification using Genetic Programming

Abstract Feature construction and feature selection are two common pre-processing methods for classification. Genetic Programming (GP) can be used to solve feature construction and feature selection tasks due to its flexible representation. In this paper, a filter-based multiple feature construction approach using GP named FCM that stores top individuals is proposed, and a filter-based feature selection approach using GP named FS that uses correlation-based evaluation method is employed. A hybrid feature construction and feature selection approach named FCMFS that first constructs multiple features using FCM then selects effective features using FS is proposed. Experiments on nine datasets show that features selected by FS or constructed by FCM are all effective to improve the classification performance comparing with original features, and our proposed FCMFS can maintain the classification performance with smaller number of features comparing with FCM, and can obtain better classification performance with smaller number of features than FS on the majority of the nine datasets. Compared with another feature construction and feature selection approach named FSFCM that first selects features using FS then constructs features using FCM, FCMFS achieves better performance in terms of classification and the smaller number of features. The comparisons with three state-of-art techniques show that our proposed FCMFS approach can achieve better experimental results in most cases.

[1]  Guoqiang Zeng,et al.  Design of fractional order PID controller for automatic regulator voltage system based on multi-objective extremal optimization , 2015, Neurocomputing.

[2]  Zheng Rong Yang,et al.  Evaluation of Mutual Information and Genetic Programming for Feature Selection in QSAR , 2004, J. Chem. Inf. Model..

[3]  Michael D. Todd,et al.  Automated Feature Design for Numeric Sequence Classification by Genetic Programming , 2015, IEEE Transactions on Evolutionary Computation.

[4]  Asoke K. Nandi,et al.  Breast Cancer Diagnosis Using Genetic Programming Generated Feature , 2005, 2005 IEEE Workshop on Machine Learning for Signal Processing.

[5]  Asoke K. Nandi,et al.  Feature generation using genetic programming with application to fault classification , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[6]  George D. Smith,et al.  Evolutionary constructive induction , 2005, IEEE Transactions on Knowledge and Data Engineering.

[7]  Krzysztof Krawiec,et al.  Visual Learning by Evolutionary and Coevolutionary Feature Synthesis , 2007, IEEE Transactions on Evolutionary Computation.

[8]  Ning Dong,et al.  An improvement decomposition-based multi-objective evolutionary algorithm using multi-search strategy , 2019, Knowl. Based Syst..

[9]  Leslie S. Smith,et al.  Feature subset selection in large dimensionality domains , 2010, Pattern Recognit..

[10]  Julie Wilson,et al.  Novel feature selection method for genetic programming using metabolomic 1H NMR data , 2006 .

[11]  K. De Jong,et al.  Effective Automated Feature Construction and Selection for Classification of Biological Sequences , 2014, PloS one.

[12]  Guifa Teng,et al.  A hybrid multiple feature construction approach for classification using Genetic Programming , 2019, Appl. Soft Comput..

[13]  Hojat Ghimatgar,et al.  An improved feature selection algorithm based on graph clustering and ant colony optimization , 2018, Knowl. Based Syst..

[14]  Qing Zhang,et al.  Feature extraction and dimensionality reduction by genetic programming based on the Fisher criterion , 2008, Expert Syst. J. Knowl. Eng..

[15]  Wuneng Zhou,et al.  Constrained population extremal optimization-based robust load frequency control of multi-area interconnected power system , 2019, International Journal of Electrical Power & Energy Systems.

[16]  Peter Nordin,et al.  Genetic programming - An Introduction: On the Automatic Evolution of Computer Programs and Its Applications , 1998 .

[17]  Larry Bull,et al.  Genetic Programming with a Genetic Algorithm for Feature Construction and Selection , 2005, Genetic Programming and Evolvable Machines.

[18]  Guoqiang Zeng,et al.  Design of PID controller based on a self-adaptive state-space predictive functional control using extremal optimization method , 2018, J. Frankl. Inst..

[19]  J. Tukey,et al.  Variations of Box Plots , 1978 .

[20]  Mengjie Zhang,et al.  Particle Swarm Optimization for Feature Selection in Classification: A Multi-Objective Approach , 2013, IEEE Transactions on Cybernetics.

[21]  Erik Goodman,et al.  On Prediction of Epileptic Seizures by Means of Genetic Programming Artificial Features , 2006, Annals of Biomedical Engineering.

[22]  William Eberle,et al.  Genetic algorithms in feature and instance selection , 2013, Knowl. Based Syst..

[23]  Douglas A. Wolfe,et al.  Nonparametric Statistical Methods , 1973 .

[24]  Kourosh Neshatian,et al.  Feature Manipulation with Genetic Programming , 2010 .

[25]  Joaquín Abellán,et al.  Credal-C4.5: Decision tree based on imprecise probabilities to classify noisy data , 2014, Expert Syst. Appl..

[26]  Richard K. Belew,et al.  New Methods for Competitive Coevolution , 1997, Evolutionary Computation.

[27]  Francisco Herrera,et al.  A Survey on the Application of Genetic Programming to Classification , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[28]  Xin Yao,et al.  A Survey on Evolutionary Computation Approaches to Feature Selection , 2016, IEEE Transactions on Evolutionary Computation.

[29]  Andrew Lewis,et al.  Enhanced multi-objective particle swarm optimisation for estimating hand postures , 2018, Knowl. Based Syst..

[30]  Mengjie Zhang,et al.  Particle swarm optimisation for feature selection in classification: Novel initialisation and updating mechanisms , 2014, Appl. Soft Comput..

[31]  Mengjie Zhang,et al.  Genetic programming for feature construction and selection in classification on high-dimensional data , 2016, Memetic Comput..

[32]  Fernando E. B. Otero,et al.  Genetic Programming for Attribute Construction in Data Mining , 2002, EuroGP.

[33]  Mengjie Zhang,et al.  A Filter Approach to Multiple Feature Construction for Symbolic Learning Classifiers Using Genetic Programming , 2012, IEEE Transactions on Evolutionary Computation.

[34]  Krzysztof Krawiec,et al.  Genetic Programming-based Construction of Features for Machine Learning and Knowledge Discovery Tasks , 2002, Genetic Programming and Evolvable Machines.

[35]  Krzysztof Krawiec,et al.  Generative learning of visual concepts using multiobjective genetic programming , 2007, Pattern Recognit. Lett..

[36]  Sunanda Das,et al.  Ensemble feature selection using bi-objective genetic algorithm , 2017, Knowl. Based Syst..

[37]  John R. Koza,et al.  Genetic programming - on the programming of computers by means of natural selection , 1993, Complex adaptive systems.

[38]  John R. Koza,et al.  Genetic Programming III: Darwinian Invention & Problem Solving , 1999 .

[39]  Mark A. Hall,et al.  Correlation-based Feature Selection for Discrete and Numeric Class Machine Learning , 1999, ICML.

[40]  Nikhil R. Pal,et al.  Genetic programming for simultaneous feature selection and classifier design , 2006, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[41]  Bir Bhanu,et al.  Evolutionary feature synthesis for object recognition , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[42]  Mark A. Hall,et al.  Correlation-based Feature Selection for Machine Learning , 2003 .

[43]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[44]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[45]  Peter Drotár,et al.  Weighted nearest neighbors feature selection , 2019, Knowl. Based Syst..

[46]  Mengjie Zhang,et al.  Using genetic programming for context-sensitive feature scoring in classification problems , 2011, Connect. Sci..