Model-based Genetic Programming with GOMEA for Symbolic Regression of Small Expressions

The Gene-pool Optimal Mixing Evolutionary Algorithm (GOMEA) has been shown to be a top performing EA in several domains, including Genetic Programming (GP). Differently from traditional EAs where variation acts randomly, GOMEA learns a model of interdependencies within the genotype, i.e., the linkage, to estimate what patterns to propagate. In this article, we study the role of Linkage Learning (LL) performed by GOMEA in Symbolic Regression (SR). We show that the non-uniformity in the distribution of the genotype in GP populations negatively biases LL, and propose a method to correct for this. We also propose approaches to improve LL when ephemeral random constants are used. Furthermore, we adapt a scheme of interleaving runs to alleviate the burden of tuning the population size, a crucial parameter for LL, to SR. We run experiments on 10 real-world datasets, enforcing a strict limitation on solution size, to enable interpretability. We find that the new LL method outperforms the standard one, and that GOMEA outperforms both traditional and semantic GP. We also find that the small solutions evolved by GOMEA are competitive with tuned decision trees, making GOMEA a promising new approach to SR.

[1]  Peter A. N. Bosman,et al.  Scalable genetic programming by gene-pool optimal mixing and input-space entropy-based building-block learning , 2017, GECCO.

[2]  P. Ross,et al.  An adverse interaction between crossover and restricted tree depth in genetic programming , 1996 .

[3]  Dirk Thierens,et al.  Optimal mixing evolutionary algorithms , 2011, GECCO '11.

[4]  Peter Rockett,et al.  The Use of an Analytic Quotient Operator in Genetic Programming , 2013, IEEE Transactions on Evolutionary Computation.

[5]  Marc Ebner,et al.  How neutral networks influence evolvability , 2001, Complex..

[6]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[7]  Sean Luke,et al.  A survey and comparison of tree generation algorithms , 2001 .

[8]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[9]  Petr Posík,et al.  Symbolic Regression Algorithms with Built-in Linear Regression , 2017, ArXiv.

[10]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[11]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[12]  Gisele L. Pappa,et al.  Solving the exponential growth of symbolic regression trees in geometric semantic genetic programming , 2018, GECCO.

[13]  Peter A. N. Bosman,et al.  Multi-objective gene-pool optimal mixing evolutionary algorithms , 2014, GECCO.

[14]  Riccardo Poli,et al.  A Field Guide to Genetic Programming , 2008 .

[15]  Mengjie Zhang,et al.  Generalisation and domain adaptation in GP with gradient descent for symbolic regression , 2015, 2015 IEEE Congress on Evolutionary Computation (CEC).

[16]  Eric Medvet,et al.  GOMGE: Gene-Pool Optimal Mixing on Grammatical Evolution , 2018, PPSN.

[17]  Josh C. Bongard,et al.  Improving genetic programming based symbolic regression using deterministic machine learning , 2013, 2013 IEEE Congress on Evolutionary Computation.

[18]  Fernando G. Lobo,et al.  A parameter-less genetic algorithm , 1999, GECCO.

[19]  Wentong Cai,et al.  Multifactorial Genetic Programming for Symbolic Regression Problems , 2020, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[20]  Tian-Li Yu,et al.  Investigation of the exponential population scheme for genetic algorithms , 2018, GECCO.

[21]  William F. Punch,et al.  Parameter-less population pyramid , 2014, GECCO.

[22]  Zachary Chase Lipton The mythos of model interpretability , 2016, ACM Queue.

[23]  Krzysztof Krawiec,et al.  Semantic Backpropagation for Designing Search Operators in Genetic Programming , 2015, IEEE Transactions on Evolutionary Computation.

[24]  Mengjie Zhang,et al.  Improving Generalization of Genetic Programming for Symbolic Regression With Angle-Driven Geometric Semantic Operators , 2019, IEEE Transactions on Evolutionary Computation.

[25]  Peter A. N. Bosman,et al.  Exploiting linkage information in real-valued optimization with the real-valued gene-pool optimal mixing evolutionary algorithm , 2017, GECCO.

[26]  Eric Medvet,et al.  Unveiling evolutionary algorithm representation with DU maps , 2018, Genetic Programming and Evolvable Machines.

[27]  Shlomo Moran,et al.  Optimal implementations of UPGMA and other common clustering algorithms , 2007, Inf. Process. Lett..

[28]  John R. Koza,et al.  Genetic programming - on the programming of computers by means of natural selection , 1993, Complex adaptive systems.

[29]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[30]  Krzysztof Krawiec,et al.  Geometric Semantic Genetic Programming , 2012, PPSN.

[31]  S. Holm A Simple Sequentially Rejective Multiple Test Procedure , 1979 .

[32]  W. Marsden I and J , 2012 .

[33]  Dirk Thierens,et al.  Hierarchical problem solving with the linkage tree genetic algorithm , 2013, GECCO '13.

[34]  Maarten Keijzer,et al.  Improving Symbolic Regression with Interval Arithmetic and Linear Scaling , 2003, EuroGP.

[35]  Krzysztof Krawiec,et al.  Behavioral Program Synthesis with Genetic Programming , 2015, Studies in Computational Intelligence.

[36]  W. Langdon An Analysis of the MAX Problem in Genetic Programming , 1997 .

[37]  Been Kim,et al.  Towards A Rigorous Science of Interpretable Machine Learning , 2017, 1702.08608.