Preserving Population Diversity Based on Transformed Semantics in Genetic Programming for Symbolic Regression

Population diversity plays an important role in avoiding premature convergence in evolutionary techniques including genetic programming (GP). Obtaining an adequate level of diversity during the evolutionary process has became a concern of many previous researches in GP. This work proposes a new novelty metric for entropy-based diversity measure for GP. The new novelty metric is based on the transformed semantics of models in GP, where the semantics are the set of outputs of a model on the training data and principal component analysis is used for a transformation of the semantics. Based on the new novelty metric, a new diversity preserving framework, which incorporates a new fitness function and a new selection operator, is proposed to help GP achieve a good balance between the exploration and the exploitation, thus enhancing its learning and generalization performance. Compared with two stat-of-the-art diversity preserving methods, the new method can generalize better and reduce the overfitting trend more effectively in most cases. Further examinations on the properties of the search process confirm that the new framework notably enhances the evolvability and locality of GP.

[1]  William F. Punch,et al.  An analysis of the genetic marker diversity algorithm for genetic programming , 2017, Genetic Programming and Evolvable Machines.

[2]  Marjan Mernik,et al.  Exploration and exploitation in evolutionary algorithms: A survey , 2013, CSUR.

[3]  Wentong Cai,et al.  Multifactorial Genetic Programming for Symbolic Regression Problems , 2020, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[4]  Leonardo Vanneschi,et al.  Genetic programming for computational pharmacokinetics in drug discovery and development , 2007, Genetic Programming and Evolvable Machines.

[5]  L. Staudt,et al.  The use of molecular profiling to predict survival after chemotherapy for diffuse large-B-cell lymphoma. , 2002, The New England journal of medicine.

[6]  Lee Spector,et al.  Epsilon-Lexicase Selection for Regression , 2016, GECCO.

[7]  Krzysztof Krawiec,et al.  Approximating geometric crossover in semantic space , 2009, GECCO.

[8]  Ahmed Kattan,et al.  Locality in Continuous Fitness-Valued Cases and Genetic Programming Difficulty , 2012, EVOLVE.

[9]  Michael Affenzeller,et al.  Online Diversity Control in Symbolic Regression via a Fast Hash-based Tree Similarity Measure , 2019, 2019 IEEE Congress on Evolutionary Computation (CEC).

[10]  Michael Affenzeller,et al.  Schema-based diversification in genetic programming , 2018, GECCO.

[11]  N. Hopper,et al.  Analysis of genetic diversity through population history , 1999 .

[12]  Edwin D. de Jong,et al.  Reducing bloat and promoting diversity using multi-objective methods , 2001 .

[13]  Kenneth O. Stanley,et al.  Efficiently evolving programs through the search for novelty , 2010, GECCO '10.

[14]  David Jackson,et al.  Phenotypic Diversity in Initial Genetic Programming Populations , 2010, EuroGP.

[15]  Lee Spector,et al.  Effects of Lexicase and Tournament Selection on Diversity Recovery and Maintenance , 2016, GECCO.

[16]  Anikó Ekárt,et al.  A Metric for Genetic Programs and Fitness Sharing , 2000, EuroGP.

[17]  Anthony Brabazon,et al.  Towards an understanding of locality in genetic programming , 2010, GECCO '10.

[18]  Simon M. Lucas,et al.  A Survey of Statistical Machine Learning Elements in Genetic Programming , 2019, IEEE Transactions on Evolutionary Computation.

[19]  Michael O'Neill,et al.  Improving the Generalisation Ability of Genetic Programming with Semantic Similarity based Crossover , 2010, EuroGP.

[20]  Una-May O'Reilly,et al.  Genetic Programming II: Automatic Discovery of Reusable Programs. , 1994, Artificial Life.

[21]  Kay Chen Tan,et al.  Visualizing the Evolution of Computer Programs for Genetic Programming [Research Frontier] , 2018, IEEE Computational Intelligence Magazine.

[22]  Wojciech Jaskowski,et al.  Better GP benchmarks: community survey results and proposals , 2012, Genetic Programming and Evolvable Machines.

[23]  Stephan M. Winkler,et al.  Similarity-Based Analysis of Population Dynamics in Genetic Programming Performing Symbolic Regression , 2016, GPTP.

[24]  Lee Spector,et al.  Lexicase Selection for Program Synthesis: A Diversity Analysis , 2016 .

[25]  Marcos André Gonçalves,et al.  A Genetic Programming approach for feature selection in highly dimensional skewed data , 2018, Neurocomputing.

[26]  Stephan M. Winkler,et al.  Dynamic observation of genotypic and phenotypic diversity for different symbolic regression GP variants , 2017, GECCO.

[27]  Tapabrata Ray,et al.  Genetic Programming With Mixed-Integer Linear Programming-Based Library Search , 2018, IEEE Transactions on Evolutionary Computation.

[28]  Maarten Keijzer,et al.  Efficiently representing populations in genetic programming , 1996 .

[29]  Sébastien Vérel,et al.  Fitness Clouds and Problem Hardness in Genetic Programming , 2004, GECCO.

[30]  Graham Kendall,et al.  A Survey And Analysis Of Diversity Measures In Genetic Programming , 2002, GECCO.

[31]  Michael O'Neill,et al.  A Fine-Grained View of GP Locality with Binary Decision Diagrams as Ant Phenotypes , 2010, PPSN.

[32]  Graham Kendall,et al.  Diversity in genetic programming: an analysis of measures and correlation with fitness , 2004, IEEE Transactions on Evolutionary Computation.

[33]  Una-May O'Reilly,et al.  Improving Genetic Programming with Novel Exploration - Exploitation Control , 2019, EuroGP.

[34]  Mengjie Zhang,et al.  Improving Generalization of Genetic Programming for Symbolic Regression With Angle-Driven Geometric Semantic Operators , 2019, IEEE Transactions on Evolutionary Computation.

[35]  Charles Ofria,et al.  Applying Ecological Principles to Genetic Programming , 2017, GPTP.

[36]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[37]  Riccardo Poli,et al.  General Schema Theory for Genetic Programming with Subtree-Swapping Crossover: Part I , 2003, Evolutionary Computation.

[38]  Sébastien Vérel,et al.  Where are bottlenecks in NK fitness landscapes? , 2003, The 2003 Congress on Evolutionary Computation, 2003. CEC '03..

[39]  Bruno Sareni,et al.  Fitness sharing and niching methods revisited , 1998, IEEE Trans. Evol. Comput..

[40]  Lukás Sekanina,et al.  Cartesian Genetic Programming as an Optimizer of Programs Evolved with Geometric Semantic Genetic Programming , 2019, EuroGP.

[41]  Mengjie Zhang,et al.  Feature Selection to Improve Generalization of Genetic Programming for High-Dimensional Symbolic Regression , 2017, IEEE Transactions on Evolutionary Computation.

[42]  William F. Punch,et al.  An Investigation of Hybrid Structural and Behavioral Diversity Methods in Genetic Programming , 2016, GPTP.

[43]  Una-May O’Reilly Using a distance metric on genetic programs to understand genetic operators , 1997, 1997 IEEE International Conference on Systems, Man, and Cybernetics. Computational Cybernetics and Simulation.

[44]  Jürgen Branke,et al.  On Using Surrogates with Genetic Programming , 2015, Evolutionary Computation.

[45]  Michael O'Neill,et al.  On the roles of semantic locality of crossover in genetic programming , 2013, Inf. Sci..

[46]  Ting Hu,et al.  Neutrality, Robustness, and Evolvability in Genetic Programming , 2016, GPTP.

[47]  Xin-She Yang,et al.  Bio-inspired computation: Where we stand and what's next , 2019, Swarm Evol. Comput..

[48]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[49]  Anthony Brabazon,et al.  Defining locality as a problem difficulty measure in genetic programming , 2011, Genetic Programming and Evolvable Machines.