Discovering a domain alphabet

A key to the success of any genetic programming process is the use of a good alphabet of atomic building blocks from which solutions can be evolved efficiently. An alphabet that is too granular may generate an unnecessarily large search space; an inappropriately coarse grained alphabet may bias or prevent finding optimal solutions. Here we introduce a method that automatically identifies a small alphabet for a problem domain. We process solutions on the complexity-optimality Pareto front of a number of sample systems and identify terms that appear significantly more frequently than merited by their size. These terms are then used as basic building blocks to solve new problems in the same problem domain. We demonstrate this process on symbolic regression for a variety of physics problems. The method discovers key terms relating to concepts such as energy and momentum. A significant performance enhancement is demonstrated when these terms are then used as basic building blocks on new physics problems. We suggest that identifying a problem-specific alphabet is key to scaling evolutionary methods to higher complexity systems.

[1]  Peter J. Fleming,et al.  Genetic Algorithms for Multiobjective Optimization: FormulationDiscussion and Generalization , 1993, ICGA.

[2]  Michael D. Schmidt,et al.  Co-evolution of Fitness Maximizers and Fitness Predictors , 2005 .

[3]  Goldberg,et al.  Genetic algorithms , 1993, Robust Control Systems with Genetic Algorithms.

[4]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[5]  Nicholas Freitag McPhee,et al.  Semantic Building Blocks in Genetic Programming , 2008, EuroGP.

[6]  Edwin D. de Jong,et al.  Multi-Objective Methods for Tree Size Control , 2003, Genetic Programming and Evolvable Machines.

[7]  M. Parizeau,et al.  A Robust Master-Slave Distribution Architecture for Evolutionary Computations , 2003 .

[8]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[9]  Mark Kotanchek,et al.  Trustable symbolic regression models: using ensembles, interval arithmetic and pareto fronts to develop robust and trust-aware models , 2008 .

[10]  Mark J. Willis,et al.  Using a tree structured genetic algorithm to perform symbolic regression , 1995 .

[11]  John R. Koza,et al.  Genetic programming - on the programming of computers by means of natural selection , 1993, Complex adaptive systems.

[12]  Una-May O'Reilly,et al.  The Troubling Aspects of a Building Block Hypothesis for Genetic Programming , 1994, FOGA.

[13]  Leonardo Vanneschi,et al.  Parallel genetic programming , 2005 .

[14]  Peter J. Fleming,et al.  An Overview of Evolutionary Algorithms in Multiobjective Optimization , 1995, Evolutionary Computation.

[15]  Michael D. Schmidt,et al.  Co-evolving Fitness Predictors for Accelerating Evaluations and Reducing Sampling , 2008 .

[16]  Nguyen Xuan Hoai,et al.  Solving the symbolic regression problem with tree-adjunct grammar guided genetic programming: the comparative results , 2002, Proceedings of the 2002 Congress on Evolutionary Computation. CEC'02 (Cat. No.02TH8600).

[17]  John J. Grefenstette,et al.  Credit assignment in rule discovery systems based on genetic algorithms , 1988, Machine Learning.

[18]  Hod Lipson,et al.  Comparison of tree and graph encodings as function of problem complexity , 2007, GECCO '07.

[19]  Justinian P. Rosea Towards Automatic Discovery of Building Blocks in Genetic Programming , 1995 .

[20]  Samir W. Mahfoud Niching methods for genetic algorithms , 1996 .

[21]  John H. Holland,et al.  Building Blocks, Cohort Genetic Algorithms, and Hyperplane-Defined Functions , 2000, Evolutionary Computation.

[22]  Hod Lipson,et al.  Coevolution of Fitness Predictors , 2008, IEEE Transactions on Evolutionary Computation.

[23]  C. Fonseca,et al.  GENETIC ALGORITHMS FOR MULTI-OBJECTIVE OPTIMIZATION: FORMULATION, DISCUSSION, AND GENERALIZATION , 1993 .

[24]  Terence Soule,et al.  Genetic Programming Theory and Practice IV , 2007 .