Multidimensional genetic programming for multiclass classification

Abstract We describe a new multiclass classification method that learns multidimensional feature transformations using genetic programming. This method optimizes models by first performing a transformation of the feature space into a new space of potentially different dimensionality, and then performing classification using a distance function in the transformed space. We analyze a novel program representation for using genetic programming to represent multidimensional features and compare it to other approaches. Similarly, we analyze the use of a distance metric for classification in comparison to simpler techniques more commonly used when applying genetic programming to multiclass classification. Finally, we compare this method to several state-of-the-art classification techniques across a broad set of problems and show that this technique achieves competitive test accuracies while also producing concise models. We also quantify the scalability of the method on problems of varying dimensionality, sample size, and difficulty. The results suggest the proposed method scales well to large feature spaces.

[1]  Ling Shao,et al.  Evolutionary compact embedding for large-scale image classification , 2015, Inf. Sci..

[2]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[3]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[4]  Leonardo Vanneschi,et al.  Multiclass Classification Through Multidimensional Clustering , 2016 .

[5]  Tao Li,et al.  A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression , 2004, Bioinform..

[6]  Leonardo Vanneschi,et al.  Classification of oncologic data with genetic programming , 2009 .

[7]  Lalit M. Patnaik,et al.  Application of genetic programming for multicategory pattern classification , 2000, IEEE Trans. Evol. Comput..

[8]  Mark Kotanchek,et al.  Pareto-Front Exploitation in Symbolic Regression , 2005 .

[9]  Nancy Wilkins-Diehr,et al.  XSEDE: Accelerating Scientific Discovery , 2014, Computing in Science & Engineering.

[10]  Jason H. Moore,et al.  GAMETES: a fast, direct algorithm for generating pure, strict, epistatic models with random architectures , 2012, BioData Mining.

[11]  Leonardo Vanneschi,et al.  A Multi-dimensional Genetic Programming Approach for Multi-class Classification Problems , 2014, EuroGP.

[12]  Vic Ciesielski,et al.  Representing classification problems in genetic programming , 2001, Proceedings of the 2001 Congress on Evolutionary Computation (IEEE Cat. No.01TH8546).

[13]  Trent McConaghy,et al.  FFX: Fast, Scalable, Deterministic Symbolic Regression Technology , 2011 .

[14]  Riccardo Poli,et al.  A Field Guide to Genetic Programming , 2008 .

[15]  Jefersson Alex dos Santos,et al.  A relevance feedback method based on genetic programming for classification of remote sensing images , 2011, Inf. Sci..

[16]  Theodore C. Belding,et al.  The Distributed Genetic Algorithm Revisited , 1995, ICGA.

[17]  Igor Kononenko,et al.  Estimating Attributes: Analysis and Extensions of RELIEF , 1994, ECML.

[18]  John R. Koza,et al.  Genetic programming - on the programming of computers by means of natural selection , 1993, Complex adaptive systems.

[19]  Luis Muñoz,et al.  M3GP - Multiclass Classification with GP , 2015, EuroGP.

[20]  Tae-Sun Choi,et al.  Genetic programming-based feature transform and classification for the automatic detection of pulmonary nodules on computed tomography images , 2012, Inf. Sci..

[21]  Lee Spector,et al.  Solving Uncompromising Problems With Lexicase Selection , 2015, IEEE Transactions on Evolutionary Computation.

[22]  Huan Liu,et al.  Toward integrating feature selection algorithms for classification and clustering , 2005, IEEE Transactions on Knowledge and Data Engineering.

[23]  Francisco Herrera,et al.  A Survey on the Application of Genetic Programming to Classification , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[24]  Kalyan Veeramachaneni,et al.  Building Predictive Models via Feature Synthesis , 2015, GECCO.

[25]  Lee Spector,et al.  Epsilon-Lexicase Selection for Regression , 2016, GECCO.

[26]  C. D. Kemp,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[27]  Ricardo Chavarriaga,et al.  The Opportunity challenge: A benchmark database for on-body sensor-based activity recognition , 2013, Pattern Recognit. Lett..

[28]  Josh C. Bongard,et al.  Improving genetic programming based symbolic regression using deterministic machine learning , 2013, 2013 IEEE Congress on Evolutionary Computation.

[29]  Marco Laumanns,et al.  SPEA2: Improving the strength pareto evolutionary algorithm , 2001 .

[30]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[31]  Kevin P. Murphy,et al.  Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[32]  Randal S. Olson,et al.  Toward the automated analysis of complex diseases in genome-wide association studies using genetic programming , 2017, GECCO.

[33]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[34]  Hod Lipson,et al.  Age-fitness pareto optimization , 2010, GECCO '10.

[35]  Leonardo Vanneschi,et al.  Genetic Programming Representations for Multi-dimensional Feature Learning in Biomedical Classification , 2017, EvoApplications.

[36]  Ricardo Chavarriaga,et al.  Benchmarking classification techniques using the Opportunity human activity dataset , 2011, 2011 IEEE International Conference on Systems, Man, and Cybernetics.

[37]  Krzysztof Krawiec,et al.  Genetic Programming-based Construction of Features for Machine Learning and Knowledge Discovery Tasks , 2002, Genetic Programming and Evolvable Machines.

[38]  Lee Spector,et al.  Assessment of problem modality by differential performance of lexicase selection in genetic programming: a preliminary report , 2012, GECCO '12.

[39]  A. Asuncion,et al.  UCI Machine Learning Repository, University of California, Irvine, School of Information and Computer Sciences , 2007 .

[40]  Timothy Perkis,et al.  Stack-based genetic programming , 1994, Proceedings of the First IEEE Conference on Evolutionary Computation. IEEE World Congress on Computational Intelligence.

[41]  Jason H. Moore,et al.  Ensemble representation learning: an analysis of fitness and survival for wrapper-based genetic programming methods , 2017, GECCO.