Fitness landscapes and problem hardness in genetic programming

The performance of searching agents, or metaheuristics, like evolutionary algorithms (genetics algorithms, genetic programming, etc.) or local search algorithms (simulated annealing, tabu search, etc.) depend on some properties of the search space structure. One concept that allows us to analyse the search space is the fitness landscape. In the case of Genetic Programming, defining and handling fitness landscapes is a particularly hard task, given the complexity of the structures being evolved of the genetic operators used. This tutorial presents some general definitions of fitness landscape. Subsequently, we will try to instantiate the concept of fitness landscape to Genetic Programming, discussing problems. The concept of landcsape geometry will be introduced and some of the most common landscape geometries and the dynamics of Genetic Programming on those landscapes will be discussed. After that, the binding between fitness landscapes and problem difficulty will be discussed and a set of measures that characterize the difficulty of a metaheuristic in searching solutions in a fitness landscape are analysed. Among those measures, particular relevance will be given to Fitness Distance Correlation (FDC), Negative Slope Coefficient (NSC), a set of measures bound to the concept of Neutrality and some distance metrics and/or similarity measures that are consistent with the most commonly used genetic operators (in particular the recently defined subtree crossover based distance). Finally, some open questions about fitness landscapes are discussed.