A comparison of predictive measures of problem difficulty for classification with Genetic Programming

In the field of Genetic Programming (GP) a question exists that is difficult to solve; how can problem difficulty be determined? In this paper the overall goal is to develop predictive tools that estimate how difficult a problem is for GP to solve. Here we analyse two groups of methods. We call the first group Evolvability Indicators (EI), measures that capture how amendable the fitness landscape is to a GP search. The second are Predictors of Expected Performance (PEP), models that take as input a set of descriptive attributes of a problem and predict the expected performance of a GP system. These predictive variables are domain specific thus problems are described in the context of the problem domain. This paper compares an EI, the Negative Slope Coefficient, and a PEP model for a GP classifier. Results suggest that the EI does not correlate with the performance of GP classifiers. Conversely, the PEP models show a high correlation with GP performance. It appears that while an EI estimates the difficulty of a search, it does not necessarily capture the difficulty of the underlying problem. However, while PEP models treat GP as a computational black-box, they can produce accurate performance predictions.

[1]  Patricia Melin,et al.  Estimating Classifier Performance with Genetic Programming , 2011, EuroGP.

[2]  Sara Silva,et al.  GPLAB A Genetic Programming Toolbox for MATLAB , 2004 .

[3]  K. Kinnear Fitness landscapes and difficulty in genetic programming , 1994, Proceedings of the First IEEE Conference on Evolutionary Computation. IEEE World Congress on Computational Intelligence.

[4]  Sébastien Vérel,et al.  Fitness Clouds and Problem Hardness in Genetic Programming , 2004, GECCO.

[5]  L. Altenberg The evolution of evolvability in genetic programming , 1994 .

[6]  Walter A. Kosters,et al.  Genetic Programming for data classification: partitioning the search space , 2004, SAC '04.

[7]  Riccardo Poli,et al.  Performance Models for Evolutionary Program Induction Algorithms Based on Problem Difficulty Indicators , 2011, EuroGP.

[8]  Bart Naudts,et al.  A comparison of predictive measures of problem difficulty in evolutionary algorithms , 2000, IEEE Trans. Evol. Comput..

[9]  Mengjie Zhang,et al.  Using Gaussian distribution to construct fitness functions in genetic programming for multiclass object classification , 2006, Pattern Recognit. Lett..

[10]  Bernard Manderick,et al.  The Genetic Algorithm and the Structure of the Fitness Landscape , 1991, ICGA.

[11]  Anthony Brabazon,et al.  Defining locality as a problem difficulty measure in genetic programming , 2011, Genetic Programming and Evolvable Machines.

[12]  Riccardo Poli,et al.  Practical performance models of algorithms in evolutionary program induction and other domains , 2010, Artif. Intell..

[13]  Terry Jones,et al.  Fitness Distance Correlation as a Measure of Problem Difficulty for Genetic Algorithms , 1995, ICGA.

[14]  Kalyanmoy Deb,et al.  Massive Multimodality, Deception, and Genetic Algorithms , 1992, PPSN.

[15]  Christopher R. Stephens,et al.  Landscapes and Effective Fitness , 2003 .

[16]  Ernesto Costa,et al.  Dynamic limits for bloat control in genetic programming and a review of past and current bloat theories , 2009, Genetic Programming and Evolvable Machines.

[17]  Una-May O'Reilly,et al.  Genetic Programming II: Automatic Discovery of Reusable Programs. , 1994, Artificial Life.

[18]  Leonardo Vanneschi,et al.  Open issues in genetic programming , 2010, Genetic Programming and Evolvable Machines.

[19]  Leonardo Trujillo,et al.  Predicting problem difficulty for genetic programming applied to data classification , 2011, GECCO '11.

[20]  Leonardo Vanneschi,et al.  A Study of Fitness Distance Correlation as a Difficulty Measure in Genetic Programming , 2005, Evolutionary Computation.

[21]  Anthony Brabazon,et al.  Towards an understanding of locality in genetic programming , 2010, GECCO '10.

[22]  David J. Spiegelhalter,et al.  Machine Learning, Neural and Statistical Classification , 2009 .

[23]  David C. Wedge,et al.  Rapid prediction of optimum population size in genetic programming using a novel genotype -: fitness correlation , 2008, GECCO '08.

[24]  Anthony Brabazon,et al.  Defining locality in genetic programming to predict performance , 2010, IEEE Congress on Evolutionary Computation.

[25]  Tin Kam Ho,et al.  Complexity Measures of Supervised Classification Problems , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[26]  E. Weinberger,et al.  Correlated and uncorrelated fitness landscapes and how to tell the difference , 1990, Biological Cybernetics.