Investigation of simplification threshold and noise level of input data in numerical simplification of genetic programs

In tree based Genetic Programming (GP) there is a tendency for program sizes to increase as the run proceeds without a corresponding improvement in fitness. This increases resource usage, both memory and CPU time, and may result in over-fitting the training data. Numerical simplification is a method for removing redundant code from the program trees as the run proceeds. Compared with the canonical genetic programming method, numerical simplification can generate much smaller programs, use much shorter evolutionary training times and achieve comparable effectiveness performance. A key parameter of this method is the simplification threshold. This paper examines whether there exists any relationship between the noise level in the input data and the optimum value for the simplification threshold and, if it exists, what that relationship is. Our results suggest that there is a relationship between the optimum value of the simplification threshold and the level of noise in the input data and that a lower bound for the optimum simplification threshold is equal to the noise level and an upper bound is five times the noise level.

[1]  Mark Johnston,et al.  Using Numerical Simplification to Control Bloat in Genetic Programming , 2008, SEAL.

[2]  Lothar Thiele,et al.  Genetic Programming and Redundancy , 1994 .

[3]  Mengjie Zhang,et al.  Using Gaussian distribution to construct fitness functions in genetic programming for multiclass object classification , 2006, Pattern Recognit. Lett..

[4]  Terence Soule,et al.  Code growth in genetic programming , 1996 .

[5]  Mark Johnston,et al.  How online simplification affects building blocks in genetic programming , 2009, GECCO.

[6]  Xiaodong Li,et al.  Multi-objective techniques in genetic programming for evolving classifiers , 2005, 2005 IEEE Congress on Evolutionary Computation.

[7]  Edwin D. de Jong,et al.  Multi-Objective Methods for Tree Size Control , 2003, Genetic Programming and Evolvable Machines.

[8]  Riccardo Poli,et al.  A Simple but Theoretically-Motivated Method to Control Bloat in Genetic Programming , 2003, EuroGP.

[9]  Kenneth A. De Jong,et al.  Artificial Evolution , 2021, Lecture Notes in Computer Science.

[10]  William B. Langdon,et al.  Quadratic Bloat in Genetic Programming , 2000, GECCO.

[11]  William D. Smart,et al.  Program Simplification in Genetic Programming for Object Classification , 2005, KES.

[12]  Peter Nordin,et al.  Genetic programming - An Introduction: On the Automatic Evolution of Computer Programs and Its Applications , 1998 .

[13]  Sean Luke,et al.  Alternative Bloat Control Methods , 2004, GECCO.

[14]  Anikó Ekárt,et al.  Shorter Fitness Preserving Genetic Programs , 1999, Artificial Evolution.

[15]  Ivanoe De Falco,et al.  A Genetic Programming Approach to Solomonoff's Probabilistic Induction , 2006, EuroGP.

[16]  Terence Soule,et al.  An Analysis of the Causes of Code Growth in Genetic Programming , 2002, Genetic Programming and Evolvable Machines.

[17]  Ivanoe De Falco,et al.  Parsimony Doesn't Mean Simplicity: Genetic Programming for Inductive Inference on Noisy Data , 2007, EuroGP.

[18]  Sean Luke,et al.  Fighting Bloat with Nonparametric Parsimony Pressure , 2002, PPSN.

[19]  John R. Koza,et al.  Genetic programming - on the programming of computers by means of natural selection , 1993, Complex adaptive systems.

[20]  Riccardo Poli,et al.  Fitness Causes Bloat , 1998 .

[21]  Graham Kendall,et al.  Problem Difficulty and Code Growth in Genetic Programming , 2004, Genetic Programming and Evolvable Machines.

[22]  Peter Nordin,et al.  Complexity Compression and Evolution , 1995, ICGA.

[23]  Nicholas S. Flann,et al.  Improving the accuracy and robustness of genetic programming through expression simplification , 1996 .

[24]  Byoung-Tak Zhang,et al.  Balancing Accuracy and Parsimony in Genetic Programming , 1995, Evolutionary Computation.

[25]  Mengjie Zhang,et al.  Algebraic simplification of GP programs during evolution , 2006, GECCO.

[26]  Wolfgang Banzhaf,et al.  A comparison of linear genetic programming and neural networks in medical data mining , 2001, IEEE Trans. Evol. Comput..