An Empirical Study of Functional Complexity as an Indicator of Overfitting in Genetic Programming

Recently, it has been stated that the complexity of a solution is a good indicator of the amount of overfitting it incurs. However, measuring the complexity of a program, in Genetic Programming, is not a trivial task. In this paper, we study the functional complexity and how it relates with overfitting on symbolic regression problems. We consider two measures of complexity, Slope-based Functional Complexity, inspired by the concept of curvature, and Regularity-based Functional Complexity based on the concept of Holderian regularity. In general, both complexity measures appear to be poor indicators of program overfitting. However, results suggest that Regularity-based Functional Complexity could provide a good indication of overfitting in extreme cases.

[1]  C. Tricot Curves and Fractal Dimension , 1994 .

[2]  Sara Silva,et al.  GPLAB A Genetic Programming Toolbox for MATLAB , 2004 .

[3]  Michael O'Neill,et al.  A Fine-Grained View of GP Locality with Binary Decision Diagrams as Ant Phenotypes , 2010, PPSN.

[4]  M. Lapidus,et al.  Fractal Geometry and Applications: A Jubilee of Benoît Mandelbrot , 2004 .

[5]  Leonardo Trujillo,et al.  Optimization of the hölder image descriptor using a genetic algorithm , 2010, GECCO '10.

[6]  Pierrick Legrand,et al.  Local regularity-based interpolation , 2003 .

[7]  S. Jaffard Wavelet Techniques in Multifractal Analysis , 2004 .

[8]  Leonardo Vanneschi,et al.  Measuring bloat, overfitting and functional complexity in genetic programming , 2010, GECCO '10.

[9]  Pierrick Legrand,et al.  Signal and Image processing with FracLab , 2004 .

[10]  Terry Jones,et al.  Fitness Distance Correlation as a Measure of Problem Difficulty for Genetic Algorithms , 1995, ICGA.

[11]  Dick den Hertog,et al.  Order of Nonlinearity as a Complexity Measure for Models Generated by Symbolic Regression via Pareto Genetic Programming , 2009, IEEE Transactions on Evolutionary Computation.

[12]  Justinian Rosca,et al.  Generality versus size in genetic programming , 1996 .

[13]  Leonardo Trujillo,et al.  The estimation of hölderian regularity using genetic programming , 2010, GECCO '10.

[14]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[15]  Robert Schaefer Parallel Problem Solving from Nature - PPSN XI, 11th International Conference, Kraków, Poland, September 11-15, 2010. Proceedings, Part II , 2010, PPSN.

[16]  Jean-Marie Morvan,et al.  Generalized Curvatures , 2008, Geometry and Computing.

[17]  Ernesto Costa,et al.  Dynamic limits for bloat control in genetic programming and a review of past and current bloat theories , 2009, Genetic Programming and Evolvable Machines.