Online Diversity Control in Symbolic Regression via a Fast Hash-based Tree Similarity Measure

Diversity represents an important aspect of genetic programming, being directly correlated with search performance. When considered at the genotype level, diversity often requires expensive tree distance measures which have a negative impact on the algorithm’s runtime performance. In this work we introduce a fast, hash-based tree distance measure to massively speed-up the calculation of population diversity during the algorithmic run. We combine this measure with the standard GA and the NSGA-II genetic algorithms to steer the search towards higher diversity. We validate the approach on a collection of benchmark problems for symbolic regression where our method consistently outperforms the standard GA as well as NSGA-II configurations with different secondary objectives.

[1]  Riccardo Poli,et al.  A Simple but Theoretically-Motivated Method to Control Bloat in Genetic Programming , 2003, EuroGP.

[2]  Hiroaki Kitano,et al.  Biological robustness , 2008, Nature Reviews Genetics.

[3]  Paulien Hogeweg,et al.  Evolutionary Consequences of Coevolving Targets , 1997, Evolutionary Computation.

[4]  Gara Miranda,et al.  Using multi-objective evolutionary algorithms for single-objective constrained and unconstrained optimization , 2016, Ann. Oper. Res..

[5]  Gabriel Valiente,et al.  An Efficient Bottom-Up Distance between Trees , 2001, SPIRE.

[6]  Leonardo Vanneschi,et al.  A survey of semantic methods in genetic programming , 2014, Genetic Programming and Evolvable Machines.

[7]  Ralph C. Merkle,et al.  A Digital Signature Based on a Conventional Encryption Function , 1987, CRYPTO.

[8]  Stephan M. Winkler,et al.  Architecture and Design of the HeuristicLab Optimization Environment , 2014 .

[9]  William F. Punch,et al.  An Investigation of Hybrid Structural and Behavioral Diversity Methods in Genetic Programming , 2016, GPTP.

[10]  William F. Punch,et al.  An analysis of the genetic marker diversity algorithm for genetic programming , 2017, Genetic Programming and Evolvable Machines.

[11]  David Jackson,et al.  Promoting Phenotypic Diversity in Genetic Programming , 2010, PPSN.

[12]  Steven M. Gustafson,et al.  Is increased diversity in genetic programming beneficial? An analysis of lineage selection , 2003, The 2003 Congress on Evolutionary Computation, 2003. CEC '03..

[13]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[14]  Wojciech Jaskowski,et al.  Better GP benchmarks: community survey results and proposals , 2012, Genetic Programming and Evolvable Machines.

[15]  Alexandros Agapitos,et al.  An Investigation of Fitness Sharing with Semantic and Syntactic Distance Metrics , 2012, EuroGP.

[16]  Marc Ebner,et al.  How neutral networks influence evolvability , 2001, Complex..

[17]  K. Matsui New selection method to improve the population diversity in genetic algorithms , 1999, IEEE SMC'99 Conference Proceedings. 1999 IEEE International Conference on Systems, Man, and Cybernetics (Cat. No.99CH37028).

[18]  Sean Luke,et al.  Two fast tree-creation algorithms for genetic programming , 2000, IEEE Trans. Evol. Comput..

[19]  Stephan M. Winkler,et al.  Evolving Simple Symbolic Regression Models by Multi-Objective Genetic Programming , 2016 .

[20]  Thomas Helmuth,et al.  A comparison of semantic-based initialization methods for genetic programming , 2018, GECCO.

[21]  Wei Yan,et al.  Behavioural GP diversity for dynamic environments: an application in hedge fund investment , 2006, GECCO '06.

[22]  Maarten Keijzer,et al.  Crossover Bias in Genetic Programming , 2007, EuroGP.

[23]  Dick den Hertog,et al.  Order of Nonlinearity as a Complexity Measure for Models Generated by Symbolic Regression via Pareto Genetic Programming , 2009, IEEE Transactions on Evolutionary Computation.

[24]  Anikó Ekárt,et al.  A Metric for Genetic Programs and Fitness Sharing , 2000, EuroGP.

[25]  Marjan Mernik,et al.  Exploration and exploitation in evolutionary algorithms: A survey , 2013, CSUR.

[26]  Ting Hu,et al.  Quantitative Analysis of Evolvability using Vertex Centralities in Phenotype Network , 2016, GECCO.

[27]  Ting Hu,et al.  Neutrality, Robustness, and Evolvability in Genetic Programming , 2016, GPTP.

[28]  Kalyanmoy Deb,et al.  A fast and elitist multiobjective genetic algorithm: NSGA-II , 2002, IEEE Trans. Evol. Comput..

[29]  N. Hopper,et al.  Analysis of genetic diversity through population history , 1999 .

[30]  Scott D. Goodwin,et al.  Keep-best reproduction: a selection strategy for genetic algorithms , 1998, SAC '98.

[31]  Una-May O’Reilly Using a distance metric on genetic programs to understand genetic operators , 1997, 1997 IEEE International Conference on Systems, Man, and Cybernetics. Computational Cybernetics and Simulation.

[32]  Eric J. Hayden,et al.  Cryptic genetic variation promotes rapid evolutionary adaptation in an RNA enzyme , 2011, Nature.

[33]  J. Freidman,et al.  Multivariate adaptive regression splines , 1991 .

[34]  Graham Kendall,et al.  Diversity in genetic programming: an analysis of measures and correlation with fitness , 2004, IEEE Transactions on Evolutionary Computation.