Universal Consistency and Bloat in GP Some theoretical considerations about Genetic Programming from a Statistical Learning Theory viewpoint

In this paper, we provide an analysis of Genetic Programming (GP) from the Statistical Learning Theory viewpoint in the scope of symbolic regression. Firstly, we are interested in Universal Consistency, i.e. the fact that the solution minimizing the empirical error does converge to the best possible error when the number of examples goes to infinity, and secondly, we focus our attention on the uncontrolled growth of program length (i.e. bloat), which is a well-known problem in GP. Results show that (1) several kinds of code bloats may be identified and that (2) Universal consistency can be obtained as well as avoiding bloat under some conditions. We conclude by describing an ad hoc method that makes it possible simultaneously to avoid bloat and to ensure universal consistency.

[1]  Lothar Thiele,et al.  Genetic Programming and Redundancy , 1994 .

[2]  Marc Parizeau,et al.  Genericity in Evolutionary Computation Software Tools: Principles and Case-study , 2006, Int. J. Artif. Intell. Tools.

[3]  Byoung-Tak Zhang,et al.  Evolutionary Induction of Sparse Neural Trees , 1997, Evolutionary Computation.

[4]  Byoung-Tak Zhang,et al.  Balancing Accuracy and Parsimony in Genetic Programming , 1995, Evolutionary Computation.

[5]  Edwin D. de Jong,et al.  Reducing bloat and promoting diversity using multi-objective methods , 2001 .

[6]  W. Langdon The evolution of size in variable length representations , 1998, 1998 IEEE International Conference on Evolutionary Computation Proceedings. IEEE World Congress on Computational Intelligence (Cat. No.98TH8360).

[7]  Wolfgang Banzhaf,et al.  Genetic Programming: An Introduction , 1997 .

[8]  Olivier Teytaud,et al.  Why Simulation-Based Approachs with Combined Fitness are a Good Approach for Mining Spaces of Turing-equivalent Functions , 2006, 2006 IEEE International Conference on Evolutionary Computation.

[9]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[10]  久志 半田 2001 Congress on Evolutionary Computation (CEC2001)に参加して , 2001 .

[11]  Nicholas Freitag McPhee,et al.  Accurate Replication in Genetic Programming , 1995, ICGA.

[12]  Lothar Thiele,et al.  Multiobjective genetic programming: reducing bloat using SPEA2 , 2001, Proceedings of the 2001 Congress on Evolutionary Computation (IEEE Cat. No.01TH8546).

[13]  Terence Soule,et al.  Effects of Code Growth and Parsimony Pressure on Populations in Genetic Programming , 1998, Evolutionary Computation.

[14]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[15]  Jonas S. Almeida,et al.  Dynamic maximum tree depth: a simple technique for avoiding bloat in tree-based GP , 2003 .

[16]  Terence Soule,et al.  Exons and Code Growth in Genetic Programming , 2002, EuroGP.

[17]  William B. Langdon,et al.  Size fair and homologous tree genetic programming crossovers , 1999 .

[18]  Peter L. Bartlett,et al.  Neural Network Learning - Theoretical Foundations , 1999 .

[19]  John R. Koza,et al.  Genetic programming - on the programming of computers by means of natural selection , 1993, Complex adaptive systems.

[20]  Graham Kendall,et al.  Problem Difficulty and Code Growth in Genetic Programming , 2004, Genetic Programming and Evolvable Machines.

[21]  Peter Nordin,et al.  Complexity Compression and Evolution , 1995, ICGA.

[22]  Riccardo Poli,et al.  The evolution of size and shape , 1999 .

[23]  Ernesto Costa,et al.  Dynamic Limits for Bloat Control: Variations on Size and Depth , 2004, GECCO.

[24]  Anikó Ekárt,et al.  Maintaining the Diversity of Genetic Programs , 2002, EuroGP.

[25]  Jason M. Daida,et al.  What Makes a Problem GP-Hard? Analysis of a Tunably Difficult Problem in Genetic Programming , 1999, Genetic Programming and Evolvable Machines.

[26]  Riccardo Poli,et al.  Fitness Causes Bloat: Mutation , 1997, EuroGP.

[27]  Sean Luke,et al.  Lexicographic Parsimony Pressure , 2002, GECCO.

[28]  B. W.,et al.  Size Fair and Homologous Tree Genetic Programming Crossovers , 1999 .

[29]  William B. Langdon,et al.  Some Considerations on the Reason for Bloat , 2002, Genetic Programming and Evolvable Machines.

[30]  Vladimir Vapnik,et al.  The Nature of Statistical Learning , 1995 .