$ε$-Lexicase selection: a probabilistic and multi-objective analysis of lexicase selection in continuous domains

Lexicase selection is a parent selection method that considers training cases individually, rather than in aggregate, when performing parent selection. Whereas previous work has demonstrated the ability of lexicase selection to solve difficult problems, the central goal of this paper is to develop the theoretical underpinnings that explain its performance. To this end, we derive an analytical formula that gives the expected probabilities of selection under lexicase selection, given a population and its behavior. In addition, we expand upon the relation of lexicase selection to many-objective optimization methods to describe the behavior of lexicase, which is to select individuals on the boundaries of Pareto fronts in high-dimensional space. We show analytically why lexicase selection performs more poorly for certain sizes of population and training cases, and show why it has been shown to perform more poorly in continuous error spaces. To address this last concern, we introduce $\epsilon$-lexicase selection, which modifies the pass condition in lexicase selection to allow near-elite individuals to pass cases, thereby improving selection performance with continuous errors. We show that $\epsilon$-lexicase outperforms several diversity-maintenance strategies on a number of real-world and synthetic regression problems.

[1]  Anne Auger,et al.  Theory of the hypervolume indicator: optimal μ-distributions and the choice of the reference point , 2009, FOGA '09.

[2]  Lee Spector,et al.  General Program Synthesis Benchmark Suite , 2015, GECCO.

[3]  Krzysztof Krawiec,et al.  Behavioral Program Synthesis with Genetic Programming , 2015, Studies in Computational Intelligence.

[4]  Krzysztof Krawiec,et al.  Behavioral Program Synthesis: Insights and Prospects , 2016 .

[5]  L. Spector,et al.  Trivial Geography in Genetic Programming , 2006 .

[6]  Krzysztof Krawiec,et al.  Behavioral programming: a broader and more detailed take on semantic GP , 2014, GECCO.

[7]  Marco Laumanns,et al.  SPEA2: Improving the strength pareto evolutionary algorithm , 2001 .

[8]  Jason H. Moore,et al.  Ensemble representation learning: an analysis of fitness and survival for wrapper-based genetic programming methods , 2017, GECCO.

[9]  Markus Wagner,et al.  Evolutionary many-objective optimization: A quick-start guide , 2015 .

[10]  Wojciech Jaskowski,et al.  Better GP benchmarks: community survey results and proposals , 2012, Genetic Programming and Evolvable Machines.

[11]  A. Asuncion,et al.  UCI Machine Learning Repository, University of California, Irvine, School of Information and Computer Sciences , 2007 .

[12]  R. I. McKay,et al.  An Investigation of Fitness Sharing in Genetic Programming , 2002 .

[13]  Nicola Beume,et al.  Pareto-, Aggregation-, and Indicator-Based Methods in Many-Objective Optimization , 2007, EMO.

[14]  Yi Mei,et al.  Geometric Semantic Crossover with an Angle-Aware Mating Scheme in Genetic Programming for Symbolic Regression , 2017, EuroGP.

[15]  Krzysztof Krawiec,et al.  Comparison of Semantic-aware Selection Methods in Genetic Programming , 2015, GECCO.

[16]  Krzysztof Krawiec,et al.  Implicit Fitness Sharing for Evolutionary Synthesis of License Plate Detectors , 2013, EvoApplications.

[17]  Lee Spector,et al.  Genetic Programming with Historically Assessed Hardness , 2009 .

[18]  William B. Langdon Evolving Data Structures with Genetic Programming , 1995, ICGA.

[19]  Hod Lipson,et al.  Age-fitness pareto optimization , 2010, GECCO '10.

[20]  Krzysztof Krawiec,et al.  Discovery of search objectives in continuous domains , 2017, GECCO.

[21]  Mengjie Zhang,et al.  Another investigation on tournament selection: modelling and visualisation , 2007, GECCO '07.

[22]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[23]  Xin Yao,et al.  Empirical Investigations of Reference Point Based Methods When Facing a Massively Large Number of Objectives: First Results , 2017, EMO.

[24]  Riccardo Poli,et al.  Schema Theory for Genetic Programming with One-Point Crossover and Point Mutation , 1997, Evolutionary Computation.

[25]  William F. Punch,et al.  An Investigation of Hybrid Structural and Behavioral Diversity Methods in Genetic Programming , 2016, GPTP.

[26]  Krzysztof Krawiec,et al.  Automatic Derivation of Search Objectives for Test-Based Genetic Programming , 2015, EuroGP.

[27]  William La Cava,et al.  A general feature engineering wrapper for machine learning using-lexicase survival , 2017 .

[28]  Lee Spector,et al.  Solving Uncompromising Problems With Lexicase Selection , 2015, IEEE Transactions on Evolutionary Computation.

[29]  Kalyanmoy Deb,et al.  Evaluating the -Domination Based Multi-Objective Evolutionary Algorithm for a Quick Computation of Pareto-Optimal Solutions , 2005, Evolutionary Computation.

[30]  Hod Lipson,et al.  Coevolution of Fitness Predictors , 2008, IEEE Transactions on Evolutionary Computation.

[31]  Lee Spector,et al.  Effects of Lexicase and Tournament Selection on Diversity Recovery and Maintenance , 2016, GECCO.

[32]  Ivo Gonçalves,et al.  Balancing Learning and Overfitting in Genetic Programming with Interleaved Sampling of Training Data , 2013, EuroGP.

[33]  M. Farina,et al.  On the optimal solution definition for many-criteria optimization problems , 2002, 2002 Annual Meeting of the North American Fuzzy Information Processing Society Proceedings. NAFIPS-FLINT 2002 (Cat. No. 02TH8622).

[34]  Krzysztof Krawiec,et al.  Using Co-solvability to Model and Exploit Synergetic Effects in Evolution , 2010, PPSN.

[35]  Leonardo Trujillo,et al.  Searching for novel regression functions , 2013, 2013 IEEE Congress on Evolutionary Computation.

[36]  Hisao Ishibuchi,et al.  Evolutionary many-objective optimization: A short review , 2008, 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence).

[37]  Marco Laumanns,et al.  Archiving With Guaranteed Convergence And Diversity In Multi-objective Optimization , 2002, GECCO.

[38]  Mengjie Zhang,et al.  Parent Selection Pressure Auto-Tuning for Tournament Selection in Genetic Programming , 2013, IEEE Transactions on Evolutionary Computation.

[39]  T. Pham-Gia,et al.  The mean and median absolute deviations , 2001 .

[40]  Lee Spector,et al.  Assessment of problem modality by differential performance of lexicase selection in genetic programming: a preliminary report , 2012, GECCO '12.

[41]  Lee Spector,et al.  The Impact of Hyperselection on Lexicase Selection , 2016, GECCO.

[42]  Marco Laumanns,et al.  Combining Convergence and Diversity in Evolutionary Multiobjective Optimization , 2002, Evolutionary Computation.

[43]  Mark Kotanchek,et al.  Pareto-Front Exploitation in Symbolic Regression , 2005 .

[44]  Alan Wright,et al.  Automatic identification of wind turbine models using evolutionary multiobjective optimization , 2016 .

[45]  Samir W. Mahfoud Niching methods for genetic algorithms , 1996 .

[46]  Jinhua Zheng,et al.  Spread Assessment for Evolutionary Multi-Objective Optimization , 2009, EMO.

[47]  Kalyanmoy Deb,et al.  A fast and elitist multiobjective genetic algorithm: NSGA-II , 2002, IEEE Trans. Evol. Comput..

[48]  Xin Yao,et al.  Many-Objective Evolutionary Algorithms , 2015, ACM Comput. Surv..

[49]  Thomas Helmuth,et al.  General Program Synthesis from Examples Using Genetic Programming with Parent Selection Based on Random Lexicographic Orderings of Test Cases , 2015 .

[50]  Leonardo Vanneschi,et al.  A survey of semantic methods in genetic programming , 2014, Genetic Programming and Evolvable Machines.

[51]  Peter Ross,et al.  Dynamic Training Subset Selection for Supervised Learning in Genetic Programming , 1994, PPSN.

[52]  Carlos A. Coello Coello,et al.  Pareto-adaptive -dominance , 2007, Evolutionary Computation.

[53]  Lee Spector,et al.  Lexicase Selection for Program Synthesis: A Diversity Analysis , 2016 .

[54]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[55]  Lee Spector,et al.  Epsilon-Lexicase Selection for Regression , 2016, GECCO.

[56]  Krzysztof Krawiec,et al.  Geometric Semantic Genetic Programming , 2012, PPSN.

[57]  Jason H. Moore,et al.  Ensemble representation learning: an analysis of fitness and survival for wrapper-based genetic programming methods , 2017 .