Semantic and structural analysis of genetic programming

Abstract Genetic programming (GP) is a subset of evolutionary computation where candidate solutions are evaluated through execution or interpreted execution. The candidate solutions generated by GP are in the form of computer programs, which are evolved to achieve a stated objective. Darwinian evolutionary theory inspires the processes that make up GP which include crossover, mutation and selection. During a GP run, crossover, mutation and selection are performed iteratively until a program that satisfies the stated objectives is produced or a certain number of time steps have elapsed. The objectives of this thesis are to empirically analyse three different aspects of these evolved programs. These three aspects are diversity, efficient representation and the changing structure of programs during evolution. In addition to these analyses, novel algorithms are presented in order to test theories, improve the overall performance of GP and reduce program size. This thesis makes three contributions to the field of GP. Firstly, a detailed analysis is performed of the process of initialisation (generating random programs to start evolution) using four novel algorithms to empirically evaluate specific traits of starting populations of programs. It is shown how two factors simultaneously effect how strong the performance of starting population will be after a GP run. Secondly, semantically based operators are applied during evolution to encourage behavioural diversity and reduce the size of programs by removing inefficient segments of code during evolution. It is demonstrated how these specialist operators can be effective individually and when combined in a series of experiments. Finally, the role of the structure of programs is considered during evolution under different evolutionary parameters considering different problem domains. This analysis reveals some interesting effects of evolution on program structure as well as offering evidence to support the success of the specialist operators.

[1]  Lothar Thiele,et al.  Genetic Programming and Redundancy , 1994 .

[2]  Colin G. Johnson,et al.  Semantically driven crossover in genetic programming , 2008, 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence).

[3]  Peter A. Whigham Inductive bias and genetic programming , 1995 .

[4]  Rafal Salustowicz,et al.  Probabilistic Incremental Program Evolution , 1997, Evolutionary Computation.

[5]  Riccardo Poli,et al.  On the Limiting Distribution of Program Sizes in Tree-Based Genetic Programming , 2007, EuroGP.

[6]  Jason M. Daida,et al.  Identifying Structural Mechanisms in Standard Genetic Programming , 2003, GECCO.

[7]  Colin G. Johnson,et al.  Semantic analysis of program initialisation in genetic programming , 2009, Genetic Programming and Evolvable Machines.

[8]  Jason M. Daida,et al.  Considering the Roles of Structure in Problem Solving by Computer , 2005 .

[9]  Graham Kendall,et al.  Sampling of Unique Structures and Behaviours in Genetic Programming , 2004, EuroGP.

[10]  Riccardo Poli,et al.  General Schema Theory for Genetic Programming with Subtree-Swapping Crossover: Part II , 2003, Evolutionary Computation.

[11]  Walter Alden Tackett,et al.  Recombination, selection, and the genetic construction of computer programs , 1994 .

[12]  Sean Luke,et al.  A Comparison of Bloat Control Methods for Genetic Programming , 2006, Evolutionary Computation.

[13]  Jason M. Daida,et al.  What Makes a Problem GP-Hard? , 2003 .

[14]  Sean Luke,et al.  Modification Point Depth and Genome Growth in Genetic Programming , 2003, Evolutionary Computation.

[15]  Astro Teller,et al.  PADO: a new learning architecture for object recognition , 1997 .

[16]  Edward P. K. Tsang,et al.  Simplifying Decision Trees Learned by Genetic Programming , 2006, 2006 IEEE International Conference on Evolutionary Computation.

[17]  Peter A. Whigham,et al.  Grammatically-based Genetic Programming , 1995 .

[18]  Matthew J. Streeter,et al.  The Root Causes of Code Growth in Genetic Programming , 2003, EuroGP.

[19]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[20]  Michael O'Neill,et al.  Grammatical Evolution: Evolving Programs for an Arbitrary Language , 1998, EuroGP.

[21]  Sean Luke,et al.  Issues in Scaling Genetic Programming: Breeding Strategies, Tree Generation, and Bloat , 2000 .

[22]  Steven M. Gustafson An analysis of diversity in genetic programming , 2004 .

[23]  Anikó Ekárt,et al.  Selection Based on the Pareto Nondomination Criterion for Controlling Code Growth in Genetic Programming , 2001, Genetic Programming and Evolvable Machines.

[24]  Peter J. Angeline,et al.  Genetic programming and emergent intelligence , 1994 .

[25]  N. Hopper,et al.  Analysis of genetic diversity through population history , 1999 .

[26]  L. Altenberg EMERGENT PHENOMENA IN GENETIC PROGRAMMING , 1994 .

[27]  Sean Luke,et al.  A survey and comparison of tree generation algorithms , 2001 .

[28]  Riccardo Poli,et al.  Parsimony pressure made easy , 2008, GECCO '08.

[29]  P. Ross,et al.  An adverse interaction between crossover and restricted tree depth in genetic programming , 1996 .

[30]  Jason M. Daida,et al.  What Makes a Problem GP-Hard? Validating a Hypothesis of Structural Causes , 2003, GECCO.

[31]  Lee Spector,et al.  A Revised Comparison of Crossover and Mutation in Genetic Programming , 1998 .

[32]  Walter A. Kosters,et al.  Detecting and Pruning Introns for Faster Decision Tree Evolution , 2004, PPSN.

[33]  Riccardo Poli,et al.  Why Ants are Hard , 1998 .

[34]  S. Luke,et al.  A Comparison of Crossover and Mutation in Genetic Programming , 1997 .

[35]  Riccardo Poli,et al.  On the Search Properties of Different Crossover Operators in Genetic Programming , 2001 .

[36]  Fabio Somenzi,et al.  CUDD: CU Decision Diagram Package Release 2.2.0 , 1998 .

[37]  Kumar Chellapilla,et al.  Evolving computer programs without subtree crossover , 1997, IEEE Trans. Evol. Comput..

[38]  Riccardo Poli,et al.  Fitness Causes Bloat , 1998 .

[39]  Hitoshi Iba,et al.  Random Tree Generation for Genetic Programming , 1996, PPSN.

[40]  Colin G. Johnson,et al.  Semantically driven mutation in genetic programming , 2009, 2009 IEEE Congress on Evolutionary Computation.

[41]  Sara Silva,et al.  Extending Operator Equalisation: Fitness Based Self Adaptive Length Distribution for Bloat Free GP , 2009, EuroGP.

[42]  Riccardo Poli,et al.  General Schema Theory for Genetic Programming with Subtree-Swapping Crossover: Part I , 2003, Evolutionary Computation.

[43]  Randal E. Bryant,et al.  Graph-Based Algorithms for Boolean Function Manipulation , 1986, IEEE Transactions on Computers.

[44]  Flemming Nielson,et al.  Principles of Program Analysis , 1999, Springer Berlin Heidelberg.

[45]  Michael O'Neill,et al.  Semantic Aware Crossover for Genetic Programming: The Case for Real-Valued Function Regression , 2009, EuroGP.

[46]  Nicholas Freitag McPhee,et al.  Semantic Building Blocks in Genetic Programming , 2008, EuroGP.

[47]  William B. Langdon,et al.  Size fair and homologous tree genetic programming crossovers , 1999 .

[48]  Edmund K. Burke,et al.  The Tree-String Problem: An Artificial Domain for Structure and Content Search , 2005, EuroGP.

[49]  Riccardo Poli,et al.  Exact Schema Theory for Genetic Programming and Variable-Length Genetic Algorithms with One-Point Crossover , 2001, Genetic Programming and Evolvable Machines.

[50]  John R. Koza,et al.  Genetic programming 2 - automatic discovery of reusable programs , 1994, Complex Adaptive Systems.

[51]  Terence Soule,et al.  Removal bias: a new cause of code growth in tree based evolutionary programming , 1998, 1998 IEEE International Conference on Evolutionary Computation Proceedings. IEEE World Congress on Computational Intelligence (Cat. No.98TH8360).

[52]  Riccardo Poli,et al.  Free lunches for function and program induction , 2009, FOGA '09.

[54]  Sean Luke,et al.  Two fast tree-creation algorithms for genetic programming , 2000, IEEE Trans. Evol. Comput..

[55]  Peter J. Angeline,et al.  The Royal Tree Problem, a Benchmark for Single and Multiple Population Genetic Programming , 1996 .

[56]  Peter A. Whigham,et al.  Search bias, language bias and genetic programming , 1996 .

[57]  P. Nordin,et al.  Explicitly defined introns and destructive crossover in genetic programming , 1996 .

[58]  W. Langdon The evolution of size in variable length representations , 1998, 1998 IEEE International Conference on Evolutionary Computation Proceedings. IEEE World Congress on Computational Intelligence (Cat. No.98TH8360).

[59]  Una-May O'Reilly,et al.  Program Search with a Hierarchical Variable Lenght Representation: Genetic Programming, Simulated Annealing and Hill Climbing , 1994, PPSN.

[60]  Ernesto Costa,et al.  Resource-Limited Genetic Programming: Replacing Tree Depth Limits , 2005 .

[61]  Sara Silva,et al.  Controlling bloat : individual and population based approaches in genetic programming , 2008 .

[62]  L. Altenberg The evolution of evolvability in genetic programming , 1994 .

[63]  John R. Woodward,et al.  No Free Lunch, Program Induction and Combinatorial Problems , 2003, EuroGP.

[64]  A. Dickson On Evolution , 1884, Science.

[65]  Christopher R. Stephens,et al.  Effective Degrees of Freedom in Genetic Algorithms and the Block Hypothesis , 1997, ICGA.

[66]  Mark J. Willis,et al.  Using a tree structured genetic algorithm to perform symbolic regression , 1995 .

[67]  William B. Langdon,et al.  Genetic Programming Bloat without Semantics , 2000, PPSN.

[68]  T. Soule,et al.  Code Size and Depth Flows in Genetic Programming , 1997 .

[69]  Franz Rothlauf,et al.  Representations for genetic and evolutionary algorithms , 2002, Studies in Fuzziness and Soft Computing.

[70]  Nichael Lynn Cramer,et al.  A Representation for the Adaptive Generation of Simple Sequential Programs , 1985, ICGA.

[71]  Dorothea Heiss-Czedik,et al.  An Introduction to Genetic Algorithms. , 1997, Artificial Life.

[72]  Graham Kendall,et al.  Diversity in genetic programming: an analysis of measures and correlation with fitness , 2004, IEEE Transactions on Evolutionary Computation.

[73]  Christopher R. Stephens,et al.  Schemata Evolution and Building Blocks , 1999, Evolutionary Computation.

[74]  P.A. Whigham,et al.  A Schema Theorem for context-free grammars , 1995, Proceedings of 1995 IEEE International Conference on Evolutionary Computation.

[75]  Walter Böhm,et al.  Exact Uniform Initialization For Genetic Programming , 1996, FOGA.

[76]  P. Angeline An Investigation into the Sensitivity of Genetic Programming to the Frequency of Leaf Selection Duri , 1996 .

[77]  Haynes Collective Adaptation: The Exchange of Coding Segments. , 1999, Evolutionary computation.

[78]  Riccardo Poli,et al.  Operator Equalisation and Bloat Free GP , 2008, EuroGP.

[79]  Riccardo Poli,et al.  Generalisation of the limiting distribution of program sizes in tree-based genetic programming and analysis of its effects on bloat , 2007, GECCO '07.

[80]  Dana H. Ballard,et al.  Rooted-tree schemata in genetic programming , 1999 .

[81]  Thomas Haynes,et al.  Phenotypical Building Blocks for Genetic Programming , 1997, ICGA.

[82]  M. Yanagiya,et al.  Efficient genetic programming based on binary decision diagrams , 1995, Proceedings of 1995 IEEE International Conference on Evolutionary Computation.