Models of Performance of Evolutionary Program Induction Algorithms Based on Indicators of Problem Difficulty

Modeling the behavior of algorithms is the realm of evolutionary algorithm theory. From a practitioner's point of view, theory must provide some guidelines regarding which algorithm/parameters to use in order to solve a particular problem. Unfortunately, most theoretical models of evolutionary algorithms are difficult to apply to realistic situations. However, in recent work (Graff and Poli, 2008, 2010), where we developed a method to practically estimate the performance of evolutionary program-induction algorithms (EPAs), we started addressing this issue. The method was quite general; however, it suffered from some limitations: it required the identification of a set of reference problems, it required hand picking a distance measure in each particular domain, and the resulting models were opaque, typically being linear combinations of 100 features or more. In this paper, we propose a significant improvement of this technique that overcomes the three limitations of our previous method. We achieve this through the use of a novel set of features for assessing problem difficulty for EPAs which are very general, essentially based on the notion of finite difference. To show the capabilities or our technique and to compare it with our previous performance models, we create models for the same two important classes of problems—symbolic regression on rational functions and Boolean function induction—used in our previous work. We model a variety of EPAs. The comparison showed that for the majority of the algorithms and problem classes, the new method produced much simpler and more accurate models than before. To further illustrate the practicality of the technique and its generality (beyond EPAs), we have also used it to predict the performance of both autoregressive models and EPAs on the problem of wind speed forecasting, obtaining simpler and more accurate models that outperform in all cases our previous performance models.

[1]  David Maxwell Chickering,et al.  A Bayesian Approach to Tackling Hard Computational Problems (Preliminary Report) , 2001, Electron. Notes Discret. Math..

[2]  Ivana Kruijff-Korbayová,et al.  A Portfolio Approach to Algorithm Selection , 2003, IJCAI.

[3]  Leonardo Vanneschi,et al.  A Study of Fitness Distance Correlation as a Difficulty Measure in Genetic Programming , 2005, Evolutionary Computation.

[4]  Xiaozhe Wang,et al.  Rule induction for forecasting method selection: Meta-learning the characteristics of univariate time series , 2009, Neurocomputing.

[5]  Yoav Shoham,et al.  Empirical hardness models: Methodology and a case study on combinatorial auctions , 2009, JACM.

[6]  Michail G. Lagoudakis,et al.  Learning to Select Branching Rules in the DPLL Procedure for Satisfiability , 2001, Electron. Notes Discret. Math..

[7]  Michail G. Lagoudakis,et al.  Algorithm Selection using Reinforcement Learning , 2000, ICML.

[8]  Riccardo Poli,et al.  General Schema Theory for Genetic Programming with Subtree-Swapping Crossover: Part I , 2003, Evolutionary Computation.

[9]  Eugene C. Freuder,et al.  Using CBR to Select Solution Strategies in Constraint Programming , 2005, ICCBR.

[10]  S. T. Buckland,et al.  An Introduction to the Bootstrap. , 1994 .

[11]  Una-May O'Reilly,et al.  Program Search with a Hierarchical Variable Lenght Representation: Genetic Programming, Simulated Annealing and Hill Climbing , 1994, PPSN.

[12]  Kevin Leyton-Brown,et al.  : The Design and Analysis of an Algorithm Portfolio for SAT , 2007, CP.

[13]  Manuela M. Veloso,et al.  Learning to Predict Performance from Formula Modeling and Training Data , 2000, ICML.

[14]  Sébastien Vérel,et al.  Fitness Clouds and Problem Hardness in Genetic Programming , 2004, GECCO.

[15]  Riccardo Poli,et al.  Fitness-proportional negative slope coefficient as a hardness measure for genetic algorithms , 2007, GECCO '07.

[16]  Goldberg,et al.  Genetic algorithms , 1993, Robust Control Systems with Genetic Algorithms.

[17]  Steven C. Wheelwright,et al.  Forecasting: Methods and Applications, 3rd Ed , 1997 .

[18]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[19]  Yoav Shoham,et al.  Boosting as a Metaphor for Algorithm Design , 2003, CP.

[20]  Nancy M. Amato,et al.  A framework for adaptive algorithm selection in STAPL , 2005, PPoPP.

[21]  Aoife Foley,et al.  Current methods and advances in forecasting of wind power generation , 2012 .

[22]  Kevin Leyton-Brown,et al.  Hydra: Automatically Configuring Algorithms for Portfolio-Based Selection , 2010, AAAI.

[23]  Andrew W. Moore,et al.  Learning Evaluation Functions for Global Optimization and Boolean Satisfiability , 1998, AAAI/IAAI.

[24]  Alex A. Freitas,et al.  Evolutionary Computation , 2002 .

[25]  Riccardo Poli,et al.  Practical performance models of algorithms in evolutionary program induction and other domains , 2010, Artif. Intell..

[26]  Leonardo Vanneschi,et al.  A Survey of Problem Difficulty in Genetic Programming , 2005, AI*IA.

[27]  Leonardo Franco,et al.  Generalization ability of Boolean functions implemented in feedforward neural networks , 2006, Neurocomputing.

[28]  Cândida Ferreira,et al.  Gene Expression Programming: A New Adaptive Algorithm for Solving Problems , 2001, Complex Syst..

[29]  Yoav Shoham,et al.  Empirical Hardness Models for Combinatorial Auctions , 2005 .

[30]  John R. Koza,et al.  Genetic Programming IV: Routine Human-Competitive Machine Intelligence , 2003 .

[31]  Cormac Gebruers,et al.  Machine Learning for Portfolio Selection Using Structure at the Instance Level , 2004, CP.

[32]  Riccardo Poli,et al.  Foundations of Genetic Programming , 1999, Springer Berlin Heidelberg.

[33]  Terry Jones,et al.  Fitness Distance Correlation as a Measure of Problem Difficulty for Genetic Algorithms , 1995, ICGA.

[34]  Eric A. Brewer,et al.  High-level optimization via automated statistical modeling , 1995, PPOPP '95.

[35]  John R. Koza,et al.  Human-competitive results produced by genetic programming , 2010, Genetic Programming and Evolvable Machines.

[36]  Riccardo Poli,et al.  Practical Model of Genetic Programming's Performance on Rational Symbolic Regression Problems , 2008, EuroGP.

[37]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[38]  Leonardo Vanneschi,et al.  Difficulty of Unimodal and Multimodal Landscapes in Genetic Programming , 2003, GECCO.

[39]  Steven C. Wheelwright,et al.  Forecasting methods and applications. , 1979 .

[40]  Brahim Hnich,et al.  Making Choices Using Structure at the Instance Level within a Case Based Reasoning Framework , 2004, CPAIOR.

[41]  Christian Bessière Principles and Practice of Constraint Programming - CP 2007, 13th International Conference, CP 2007, Providence, RI, USA, September 23-27, 2007, Proceedings , 2007, CP.

[42]  Sébastien Vérel,et al.  Negative Slope Coefficient: A Measure to Characterize Genetic Programming Fitness Landscapes , 2006, EuroGP.

[43]  Yoav Shoham,et al.  Understanding Random SAT: Beyond the Clauses-to-Variables Ratio , 2004, CP.

[44]  Ethem Alpaydin,et al.  Introduction to machine learning , 2004, Adaptive computation and machine learning.

[45]  Julian Francis Miller,et al.  Cartesian genetic programming , 2010, GECCO.

[46]  Kate Smith-Miles,et al.  Cross-disciplinary perspectives on meta-learning for algorithm selection , 2009, CSUR.

[47]  L. Vanneschi,et al.  Pros and Cons of Fitness Distance Correlation in Genetic Programming , 2003 .

[48]  Jr. Sheldon B. Akers,et al.  On a Theory of Boolean Functions , 1959 .

[49]  James Demmel,et al.  Statistical Models for Automatic Performance Tuning , 2001, International Conference on Computational Science.

[50]  Leonardo Vanneschi,et al.  Fitness Distance Correlation And Problem Difficulty For Genetic Programming , 2002, GECCO.

[51]  Kevin Leyton-Brown,et al.  Performance Prediction and Automated Tuning of Randomized and Parametric Algorithms , 2006, CP.

[52]  M. O'Neill,et al.  Grammatical evolution , 2001, GECCO '09.

[53]  John R. Koza,et al.  Genetic programming - on the programming of computers by means of natural selection , 1993, Complex adaptive systems.

[54]  Mark Wallace,et al.  Principles and Practice of Constraint Programming – CP 2004 , 2004, Lecture Notes in Computer Science.

[55]  Anthony Brabazon,et al.  Defining locality as a problem difficulty measure in genetic programming , 2011, Genetic Programming and Evolvable Machines.

[56]  Ricardo A. Baeza-Yates,et al.  Searching in metric spaces , 2001, CSUR.

[57]  Panagiotis Stamatopoulos,et al.  Combinatorial optimization through statistical instance-based learning , 2001, Proceedings 13th IEEE International Conference on Tools with Artificial Intelligence. ICTAI 2001.

[58]  Riccardo Poli,et al.  A Field Guide to Genetic Programming , 2008 .

[59]  Kalyanmoy Deb,et al.  Analyzing Deception in Trap Functions , 1992, FOGA.

[60]  Yoav Shoham,et al.  A portfolio approach to algorithm select , 2003, IJCAI 2003.

[61]  Riccardo Poli,et al.  General Schema Theory for Genetic Programming with Subtree-Swapping Crossover: Part II , 2003, Evolutionary Computation.

[62]  Kevin Leyton-Brown,et al.  Hierarchical Hardness Models for SAT , 2007, CP.

[63]  Ricardo Vilalta,et al.  A Perspective View and Survey of Meta-Learning , 2002, Artificial Intelligence Review.

[64]  Yoav Shoham,et al.  Learning the Empirical Hardness of Optimization Problems: The Case of Combinatorial Auctions , 2002, CP.

[65]  Leonardo Vanneschi,et al.  Theory and practice for efficient genetic programming , 2004 .

[66]  David E. Goldberg,et al.  Genetic Algorithm Difficulty and the Modality of Fitness Landscapes , 1994, FOGA.

[67]  John R. Rice,et al.  The Algorithm Selection Problem , 1976, Adv. Comput..

[68]  Kevin Leyton-Brown,et al.  SATzilla: Portfolio-based Algorithm Selection for SAT , 2008, J. Artif. Intell. Res..

[69]  Leonardo Vanneschi,et al.  Fitness Distance Correlation in Structural Mutation Genetic Programming , 2003, EuroGP.

[70]  Franz Rothlauf,et al.  On the Locality of Grammatical Evolution , 2006, EuroGP.

[71]  Toby Walsh,et al.  Proceedings of the Seventeenth International Conference on Machine Learning (ICML 2000), Stanford University, Stanford, CA, USA, June 29 - July 2, 2000 , 2000, ICML.