Learning feature spaces for regression with genetic programming

Genetic programming has found recent success as a tool for learning sets of features for regression and classification. Multidimensional genetic programming is a useful variant of genetic programming for this task because it represents candidate solutions as sets of programs. These sets of programs expose additional information that can be exploited for building block identification. In this work, we discuss this architecture and others in terms of their propensity for allowing heuristic search to utilize information during the evolutionary process. We investigate methods for biasing the components of programs that are promoted in order to guide search towards useful and complementary feature spaces. We study two main approaches: (1) the introduction of new objectives and (2) the use of specialized semantic variation operators. We find that a semantic crossover operator based on stagewise regression leads to significant improvements on a set of regression problems. The inclusion of semantic crossover produces state-of-the-art results in a large benchmark study of open-source regression problems in comparison to several state-of-the-art machine learning approaches and other genetic programming frameworks. Finally, we look at the collinearity and complexity of the data representations produced by different methods, in order to assess whether relevant, concise, and independent factors of variation can be produced in application.

[1]  Luis Muñoz,et al.  M3GP - Multiclass Classification with GP , 2015, EuroGP.

[2]  Klaus-Robert Müller,et al.  Better Representations: Invariant, Disentangled and Reusable , 2012, Neural Networks: Tricks of the Trade.

[3]  Leonardo Vanneschi,et al.  A C++ framework for geometric semantic genetic programming , 2014, Genetic Programming and Evolvable Machines.

[4]  Vincenzo Cutello,et al.  Parallel Problem Solving from Nature - PPSN XII , 2012, Lecture Notes in Computer Science.

[5]  Eric Sadit Tellez,et al.  Semantic Genetic Programming Operators Based on Projections in the Phenotype Space , 2015, Res. Comput. Sci..

[6]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[7]  Krzysztof Krawiec,et al.  Multiple regression genetic programming , 2014, GECCO.

[8]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[9]  Krzysztof Krawiec,et al.  Behavioral Program Synthesis with Genetic Programming , 2015, Studies in Computational Intelligence.

[10]  Trevor Hastie,et al.  An Introduction to Statistical Learning , 2013, Springer Texts in Statistics.

[11]  Kenneth O. Stanley,et al.  Improving Exploration in Evolution Strategies for Deep Reinforcement Learning via a Population of Novelty-Seeking Agents , 2017, NeurIPS.

[12]  Hod Lipson,et al.  Age-fitness pareto optimization , 2010, GECCO '10.

[13]  Randal S. Olson,et al.  PMLB: a large benchmark suite for machine learning evaluation and comparison , 2017, BioData Mining.

[14]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[15]  Jason H. Moore,et al.  Ensemble representation learning: an analysis of fitness and survival for wrapper-based genetic programming methods , 2017, GECCO.

[16]  Risto Miikkulainen,et al.  Evolving Neural Networks through Augmenting Topologies , 2002, Evolutionary Computation.

[17]  Evolutionary Constructive Induction , 2017, Encyclopedia of Machine Learning and Data Mining.

[18]  William Whitney Disentangled Representations in Neural Models , 2016, ArXiv.

[19]  Krzysztof Krawiec,et al.  Semantic Backpropagation for Designing Search Operators in Genetic Programming , 2015, IEEE Transactions on Evolutionary Computation.

[20]  Krzysztof Krawiec,et al.  Geometric Semantic Genetic Programming , 2012, PPSN.

[21]  Dick den Hertog,et al.  Order of Nonlinearity as a Complexity Measure for Models Generated by Symbolic Regression via Pareto Genetic Programming , 2009, IEEE Transactions on Evolutionary Computation.

[22]  Jason H. Moore,et al.  Where are we now?: a large benchmark study of recent symbolic regression methods , 2018, GECCO.

[23]  Lee Spector,et al.  Assessment of problem modality by differential performance of lexicase selection in genetic programming: a preliminary report , 2012, GECCO '12.

[24]  Peter L. Brooks,et al.  Visualizing data , 1997 .

[25]  Arthur E. Hoerl,et al.  Ridge Regression: Biased Estimation for Nonorthogonal Problems , 2000, Technometrics.

[26]  Krzysztof Krawiec,et al.  Exploiting Subprograms in Genetic Programming , 2017, GPTP.

[27]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Kenneth O. Stanley,et al.  A Hypercube-Based Encoding for Evolving Large-Scale Neural Networks , 2009, Artificial Life.

[29]  Lee Spector,et al.  Epsilon-Lexicase Selection for Regression , 2016, GECCO.

[30]  Joost van de Weijer,et al.  Image-to-image translation for cross-domain disentanglement , 2018, NeurIPS.

[31]  R. Tibshirani,et al.  Diagnosis of multiple cancer types by shrunken centroids of gene expression , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[32]  Quoc V. Le,et al.  Measuring Invariances in Deep Networks , 2009, NIPS.

[33]  Abhishek Kumar,et al.  Variational Inference of Disentangled Latent Concepts from Unlabeled Observations , 2017, ICLR.

[34]  Kalyan Veeramachaneni,et al.  Building Predictive Models via Feature Synthesis , 2015, GECCO.

[35]  Quoc V. Le,et al.  Neural Architecture Search with Reinforcement Learning , 2016, ICLR.

[36]  Vinicius Veloso de Melo,et al.  Automatic feature engineering for regression models with machine learning: An evolutionary computation and statistics hybrid , 2018, Inf. Sci..

[37]  A. E. Hoerl,et al.  Ridge regression: biased estimation for nonorthogonal problems , 2000 .

[38]  Quoc V. Le,et al.  Efficient Neural Architecture Search via Parameter Sharing , 2018, ICML.

[39]  Dominic P. Searson,et al.  Co‐evolution of non‐linear PLS model components , 2007 .

[40]  David Pfau,et al.  Convolution by Evolution: Differentiable Pattern Producing Networks , 2016, GECCO.

[41]  Leonardo Vanneschi,et al.  Multidimensional genetic programming for multiclass classification , 2019, Swarm Evol. Comput..

[42]  Dario Floreano,et al.  Neuroevolution: from architectures to learning , 2008, Evol. Intell..

[43]  Risto Miikkulainen,et al.  Designing neural networks through neuroevolution , 2019, Nat. Mach. Intell..

[44]  Christopher Burgess,et al.  beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework , 2016, ICLR 2016.

[45]  Jason H. Moore,et al.  A General Feature Engineering Wrapper for Machine Learning Using \epsilon -Lexicase Survival , 2017, EuroGP.

[46]  Vinicius Veloso de Melo,et al.  Kaizen programming , 2014, GECCO.

[47]  J. H. Wilkinson,et al.  AN ESTIMATE FOR THE CONDITION NUMBER OF A MATRIX , 1979 .

[48]  Christopher K. I. Williams,et al.  A Framework for the Quantitative Evaluation of Disentangled Representations , 2018, ICLR.

[49]  Jason H. Moore,et al.  A Probabilistic and Multi-Objective Analysis of Lexicase Selection and ε-Lexicase Selection , 2019, Evolutionary Computation.

[50]  Leonardo Vanneschi,et al.  Multiclass Classification Through Multidimensional Clustering , 2016 .

[51]  Li Fei-Fei,et al.  Progressive Neural Architecture Search , 2017, ECCV.

[52]  Jason H. Moore,et al.  Learning concise representations for regression by evolving networks of trees , 2018, ICLR.

[53]  Leonardo Vanneschi,et al.  A Multi-dimensional Genetic Programming Approach for Multi-class Classification Problems , 2014, EuroGP.

[54]  Lior Wolf,et al.  A Two-Step Disentanglement Method , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[55]  Frank Neumann,et al.  Proceedings of the Genetic and Evolutionary Computation Conference 2016 , 2016, GECCO 2016.

[56]  Kalyanmoy Deb,et al.  A Fast Elitist Non-dominated Sorting Genetic Algorithm for Multi-objective Optimisation: NSGA-II , 2000, PPSN.

[57]  Stephan M. Winkler,et al.  Evolving Simple Symbolic Regression Models by Multi-Objective Genetic Programming , 2016 .

[58]  Krzysztof Krawiec,et al.  Behavioral programming: a broader and more detailed take on semantic GP , 2014, GECCO.

[59]  Risto Miikkulainen,et al.  Efficient Non-linear Control Through Neuroevolution , 2006, ECML.

[60]  Dominic P. Searson,et al.  GPTIPS: An Open Source Genetic Programming Toolbox For Multigene Symbolic Regression , 2010 .

[61]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[62]  Amir Hossein Gandomi,et al.  A new multi-gene genetic programming approach to nonlinear system modeling. Part I: materials and structural engineering problems , 2011, Neural Computing and Applications.

[63]  Leonardo Vanneschi,et al.  PSXO: population-wide semantic crossover , 2017, GECCO.

[64]  R. O’Brien,et al.  A Caution Regarding Rules of Thumb for Variance Inflation Factors , 2007 .

[65]  Dapeng Oliver Wu,et al.  Why Deep Learning Works: A Manifold Disentanglement Perspective , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[66]  Conor Ryan,et al.  A New Wave: A Dynamic Approach to Genetic Programming , 2016, GECCO.

[67]  Marc Schoenauer,et al.  Memetic Semantic Genetic Programming , 2015, GECCO.

[68]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[69]  Jason H. Moore,et al.  $ε$-Lexicase selection: a probabilistic and multi-objective analysis of lexicase selection in continuous domains , 2017, ArXiv.

[70]  Mehryar Mohri,et al.  AdaNet: Adaptive Structural Learning of Artificial Neural Networks , 2016, ICML.

[71]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[72]  David A. Belsley A Guide to using the collinearity diagnostics , 1991, Computer Science in Economics and Management.

[73]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[74]  Trent McConaghy,et al.  FFX: Fast, Scalable, Deterministic Symbolic Regression Technology , 2011 .

[75]  Mengjie Zhang,et al.  A Filter Approach to Multiple Feature Construction for Symbolic Learning Classifiers Using Genetic Programming , 2012, IEEE Transactions on Evolutionary Computation.

[76]  Krzysztof Krawiec,et al.  On relationships between semantic diversity, complexity and modularity of programming tasks , 2012, GECCO '12.

[77]  Krzysztof Krawiec,et al.  Genetic Programming-based Construction of Features for Machine Learning and Knowledge Discovery Tasks , 2002, Genetic Programming and Evolvable Machines.

[78]  Jason H. Moore,et al.  Semantic variation operators for multidimensional genetic programming , 2019, GECCO.

[79]  Christian Igel,et al.  Neuroevolution for reinforcement learning using evolution strategies , 2003, The 2003 Congress on Evolutionary Computation, 2003. CEC '03..

[80]  Leonardo Vanneschi,et al.  Evolving multidimensional transformations for symbolic regression with M3GP , 2018, Memetic Comput..

[81]  Quoc V. Le,et al.  Large-Scale Evolution of Image Classifiers , 2017, ICML.

[82]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[83]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[84]  Kenneth O. Stanley,et al.  Compositional Pattern Producing Networks : A Novel Abstraction of Development , 2007 .