Semantic schema theory for genetic programming

Schema theory is the most well-known model of evolutionary algorithms. Imitating from genetic algorithms (GA), nearly all schemata defined for genetic programming (GP) refer to a set of points in the search space that share some syntactic characteristics. In GP, syntactically similar individuals do not necessarily have similar semantics. The instances of a syntactic schema do not behave similarly, hence the corresponding schema theory becomes unreliable. Therefore, these theories have been rarely used to improve the performance of GP. The main objective of this study is to propose a schema theory which could be a more realistic model for GP and could be potentially employed for improving GP in practice. To achieve this aim, the concept of semantic schema is introduced. This schema partitions the search space according to semantics of trees, regardless of their syntactic variety. We interpret the semantics of a tree in terms of the mutual information between its output and the target. The semantic schema is characterized by a set of semantic building blocks and their joint probability distribution. After introducing the semantic building blocks, an algorithm for finding them in a given population is presented. An extraction method that looks for the most significant schema of the population is provided. Moreover, an exact microscopic schema theorem is suggested that predicts the expected number of schema samples in the next generation. Experimental results demonstrate the capability of the proposed schema definition in representing the semantics of the schema instances. It is also revealed that the semantic schema theorem estimation is more realistic than previously defined schemata.

[1]  Riccardo Poli,et al.  Hyperschema Theory for GP with One-Point Crossover, Building Blocks, and Some New Results in GA Theory , 2000, EuroGP.

[2]  Dana H. Ballard,et al.  Rooted-tree schemata in genetic programming , 1999 .

[3]  Mengjie Zhang,et al.  Empirical analysis of schemata in Genetic Programming using maximal schemata and MSG , 2008, 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence).

[4]  Maarten Keijzer,et al.  Improving Symbolic Regression with Interval Arithmetic and Linear Scaling , 2003, EuroGP.

[5]  Justinian P. Rosca,et al.  Causality in Genetic Programming , 1995, International Conference on Genetic Algorithms.

[6]  ProgrammingJustinian P. RoscaComputer Analysis of Complexity Drift in Genetic , 1997 .

[7]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[8]  Pedro Larrañaga,et al.  Estimation of Distribution Algorithms , 2002, Genetic Algorithms and Evolutionary Computation.

[9]  L. Altenberg EMERGENT PHENOMENA IN GENETIC PROGRAMMING , 1994 .

[10]  P.A. Whigham,et al.  A Schema Theorem for context-free grammars , 1995, Proceedings of 1995 IEEE International Conference on Evolutionary Computation.

[11]  Thomas Back Proceedings of the Seventh International Conference on Genetic Algorithms: Michigan State University, East Lansing, MI, July 19-23, 1997 , 1997 .

[12]  R. Poli,et al.  Exact GP schema theory for headless chicken crossover and subtree mutation , 2001, Proceedings of the 2001 Congress on Evolutionary Computation (IEEE Cat. No.01TH8546).

[13]  L. Altenberg The evolution of evolvability in genetic programming , 1994 .

[14]  Walter Alden Tackett,et al.  Mining the Genetic Program , 1995, IEEE Expert.

[15]  Malcolm I. Heywood,et al.  Context-Based Repeated Sequences in Linear Genetic Programming , 2005, EuroGP.

[16]  John R. Koza,et al.  Genetic programming - on the programming of computers by means of natural selection , 1993, Complex adaptive systems.

[17]  Riccardo Poli,et al.  The Building Block Basis for Genetic Programming and Variable-length Genetic Algorithms , 2005 .

[18]  William B. Langdon,et al.  Repeated Sequences in Linear Genetic Programming Genomes , 2005, Complex Syst..

[19]  Riccardo Poli,et al.  Exact Schema Theorem and Effective Fitness for GP with One-Point Crossover , 2000, GECCO.

[20]  Dick den Hertog,et al.  Order of Nonlinearity as a Complexity Measure for Models Generated by Symbolic Regression via Pareto Genetic Programming , 2009, IEEE Transactions on Evolutionary Computation.

[21]  Mohammad Mehdi Ebadzadeh,et al.  Estimation of mutual information by the fuzzy histogram , 2014, Fuzzy Optimization and Decision Making.

[22]  Riccardo Poli,et al.  General Schema Theory for Genetic Programming with Subtree-Swapping Crossover: Part II , 2003, Evolutionary Computation.

[23]  R. Poli,et al.  Exact schema theory for GP and variable-length GAs with homologous crossover , 2001 .

[24]  Gang Li,et al.  Using Instruction Matrix Based Genetic Programming to Evolve Programs , 2007, ISICA.

[25]  Hussein A. Abbass,et al.  A Survey of Probabilistic Model Building Genetic Programming , 2006, Scalable Optimization via Probabilistic Modeling.

[26]  Riccardo Poli,et al.  Schema Theory for Genetic Programming with One-Point Crossover and Point Mutation , 1997, Evolutionary Computation.

[27]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[28]  Riccardo Poli,et al.  An Experimental Analysis of Schema Creation, Propagation and Disruption in Genetic Programming , 1997, ICGA.

[29]  Michael O'Neill,et al.  Genetic Programming and Evolvable Machines Manuscript No. Semantically-based Crossover in Genetic Programming: Application to Real-valued Symbolic Regression , 2022 .

[30]  Thomas Haynes,et al.  Phenotypical Building Blocks for Genetic Programming , 1997, ICGA.

[31]  Riccardo Poli,et al.  Exact Schema Theory for Genetic Programming and Variable-Length Genetic Algorithms with One-Point Crossover , 2001, Genetic Programming and Evolvable Machines.

[32]  Leonardo Vanneschi,et al.  Genetic programming needs better benchmarks , 2012, GECCO '12.

[33]  Peter A. Whigham Inductive bias and genetic programming , 1995 .

[34]  Riccardo Poli,et al.  Exact Schema Theorems for GP with One-Point and Standard Crossover Operating on Linear Structures and Their Application to the Study of the Evolution of Size , 2001, EuroGP.

[35]  Carlos A. Coello Coello,et al.  Mutual information-based fitness functions for evolutionary circuit synthesis , 2004, Proceedings of the 2004 Congress on Evolutionary Computation (IEEE Cat. No.04TH8753).

[36]  Riccardo Poli,et al.  General Schema Theory for Genetic Programming with Subtree-Swapping Crossover , 2001, EuroGP.

[37]  J. A. Lozano,et al.  Estimation of Distribution Algorithms: A New Tool for Evolutionary Computation , 2001 .

[38]  Riccardo Poli,et al.  Using Schema Theory To Explore Interactions Of Multiple Operators , 2002, GECCO.

[39]  David E. Goldberg,et al.  BUILDING-BLOCK SUPPLY IN GENETIC PROGRAMMING , 2003 .

[40]  Justinian P. Rosca,et al.  Discovery of subroutines in genetic programming , 1996 .

[41]  Peter A. Whigham,et al.  Search bias, language bias and genetic programming , 1996 .

[42]  Nguyen Xuan Hoai,et al.  Estimating the distribution and propagation of genetic programming building blocks through tree compression , 2009, GECCO.

[43]  Chilukuri K. Mohan,et al.  Towards an Information Theoretic Framework for Genetic Programming , 2008 .

[44]  John R. Koza,et al.  Human-competitive results produced by genetic programming , 2010, Genetic Programming and Evolvable Machines.

[45]  Una-May O'Reilly,et al.  The Troubling Aspects of a Building Block Hypothesis for Genetic Programming , 1994, FOGA.

[46]  William B. Langdon,et al.  Repeated patterns in genetic programming , 2008, Natural Computing.

[47]  Riccardo Poli,et al.  General Schema Theory for Genetic Programming with Subtree-Swapping Crossover: Part I , 2003, Evolutionary Computation.

[48]  Gang Li,et al.  Evolve Schema Directly Using Instruction Matrix Based Genetic Programming , 2005, EuroGP.

[49]  Mark Johnston,et al.  Analysis of Building Blocks with Numerical Simplification in Genetic Programming , 2010, EuroGP.

[50]  A. Kraskov,et al.  Estimating mutual information. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[51]  Terence Soule,et al.  Genetic Programming Theory and Practice V , 2008 .

[52]  Riccardo Poli,et al.  Exact Schema Theory and Markov Chain Models for Genetic Programming and Variable-length Genetic Algorithms with Homologous Crossover , 2004, Genetic Programming and Evolvable Machines.

[53]  Lee Altenberg,et al.  The Schema Theorem and Price's Theorem , 1994, FOGA.

[54]  William B. Langdon,et al.  Repeated Patterns in Tree Genetic Programming , 2005, EuroGP.

[55]  Riccardo Poli,et al.  Theoretical results in genetic programming: the next ten years? , 2010, Genetic Programming and Evolvable Machines.

[56]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[57]  Hammad Majeed,et al.  A new approach to evaluate GP schema in context , 2005, GECCO '05.

[58]  Mengjie Zhang,et al.  Empirical Analysis of GP Tree-Fragments , 2007, EuroGP.

[59]  Chilukuri K. Mohan,et al.  Information theoretic indicators of fitness, relevant diversity & pairing potential in genetic programming , 2005, 2005 IEEE Congress on Evolutionary Computation.

[60]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[61]  Stuart W. Card,et al.  Towards an Information Theoretic Framework for Evolutionary Learning , 2011 .