A large number of empirical methods for characterizing problem difficulty have appeared in the last 15 or so years, including the work on phase transitions, fitness landscape correlation length, analysis of optima distributions, and algorithm run-time distributions. These methods have been successful either in predicting the difficulty of an ensemble of problem instances or providing descriptive characterizations of algorithm performance. However, they are of limited use in explaining and predicting the performance of algorithms on individual problem instances. We argue that the development of empirical methods for characterizing problem difficulty at the instance level is necessary for an advanced understanding of algorithm behavior. Further, the practical benefit is tremendous, enabling 1) the development of more comprehensive benchmarks, 2) problem-sensitive algorithm selection, and 3) intelligent tuning of problem-sensitive algorithm parameters. Which problem instances are difficult? What makes a problem instance difficult? Why does one algorithm outperform another on a particular problem instance? Attempts to answer these and other related, open questions have led to the development of many empirical methods for characterizing problem difficulty. These methods have been designed, explicitly or implicitly, for either predicting the difficulty of an ensemble of problem instances or providing a purely descriptive characterization of algorithm performance on a specific problem instance, and have generally achieved these design goals. Yet, many basic questions involving problem difficulty and algorithm performance remain unanswered; for example, we still do not know why meta-heuristics such as tabu search generally work better than simulated annealing, and cannot predict which specific problem instances will cause difficulties for particular algorithms. The current situation naturally leads us to question our goals in designing empirical methods for characterizing problem difficulty. We argue that a new class of empirical methods is required, which possess predictive and explanatory capabilities at the instance, as opposed to ensemble, level. We developed this position through our investigations into the relationship between structural features of problem instances and the performance of local search algorithms for the flow-shop (FSP) and job-shop (JSP) scheduling problems [WBHW99] [Wat00]. Our research indicates a strong relationship between the type and strength of structural features found in a particular problem instance and algorithm performance; the presence of some features made optimization trivial, while others made it exceptionally difficult. Further, we found the dependency to be algorithm-dependent: problem instances with a specific type of feature may be easy for a genetic algorithm, but very difficult for tabu search. These observations confirm the well-known theoretical result that no single algorithm can provide the best performance over all possible problem instances (the No Free Lunch Theorem [WM95]). Based on these observations, we began researching empirical methods for characterizing problem difficulty for both local search algorithms and the more general case. Our principle goal was to determine if any existing methods could predict the presence (or absence, in the case of completely random problems) of particular structural features in various problem instances. First, we considered empirical methods specifically targeted for analyzing problem difficulty for local search algorithms, which are often the algorithms of choice for combinatorial optimization problems such as the Traveling Salesman Problem (TSP) and the JSP. Such methods consist of a move operator and a meta-heuristic, such as tabu search or simulated annealing. Together, the move operator and the objective function define the search space, referred to in the local search community as the fitness landscape. Clearly, the success of any local search algorithm depends exclusively on 1) the fitness landscape imposed by the combination of move operator and problem instance, and 2) how successfully the meta-heuristic can navigate this landscape. All empirical methods for characterizing problem difficulty for local search algorithms operate by computing summary statistics for samples of a fitness landscape. The most predominant of these methods are correlation length and analysis of local optima distributions. Correlation length [Wei90] is computed using the objective function values obtained from a random walk of a fitness landscape; longer correlation lengths suggest a smoother landscape which is in turn easier to search. While it is often used to demonstrate the benefit of one move operator over another [MdWS91], recent analytic work has established that correlation length is often strictly a function of problem size and move operator, and therefore cannot be used to differentiate among problem instances of equivalent size [SS92] [SH92]. We have demonstrated this invariance empirically for the JSP [Wat00]. Further, [Ran98] has empirically demonstrated that the correlation length for MAXSAT problems is actually independent of the SAT phase transition in problem difficulty. Analysis of local optima distributions have been widely used to justify why local search algorithms work well for particular types of optimization problems. Several forms of this analysis exist, the foremost being the Fitness-Distance Correlation (FDC) analysis of [BKM94]. Here, several local optima are randomly generated, a scatter-plot of objective value vs. distance to the best solution is produced, and a correlation coefficient is computed. High correlation suggests a ’big-valley’ distribution, which [BKM94] argue as evidence for why iterated local search works well for the TSP and [RY98] argue as evidence for why path relinking works well for the FSP. Empirically, the ability of these two analysis methods to differentiate problem instances depends heavily on the type of problem considered (i.e., TSP vs. JSP). FDC is able to detect different types of structure for the Graph Bipartitioning Problem [MF00]. In contrast, we have found identical FDC results for both easy and difficult random TSP and FSP problem instances of the same size. Further, we found both of these methods incapable of differentiating either random or structured JSP problem instances [Wat00]. Both correlation length and FDC-like analysis have been used primarily as aposteriori analysis tools for explaining the difficulty of particular problem instances. However, neither method is successful in this goal, as relative problem difficulty cannot be accurately determined. Correlation length universally depends only on problem size (and move operator); the measure cannot be used to determine the relative difficulty of a number of equal-sized problem instances. FDC is marginally successful in some problem domains, and fails completely in others. Two other empirical methods which are not specifically tied to local search algorithms are widely used in the AI community: phase transitions and run-time distributions. Phase transitions attempt to characterize the difficulty of a set of problems sharing a common value of some metric (e.g., the ratio of clauses to variables in SAT problems). Phase transitions do accurately identify the types of problems that are likely to be difficult they are intended as an average-case analysis tool. However, work on the TSP [GW95] and JSP [BJ97] phase transitions indicates that very easy instances can still reside within the transition region. Run-time distributions [Hoo98] completely characterize observable algorithm performance, but provide little insight into the ’why’ of algorithm performance i.e., what features of the fitness landscape are causing the difficulties. Clearly, the performance of search algorithms depends crucially on the search space topology induced by particular problem instances. For the JSP and other problems, we have found that existing empirical methods for characterizing problem difficulty either 1) measure trends within large ensembles of problems, 2) are sensitive only to problem dimension or type, or 3) abstract away the details of the search space topology. Thus, our experience leads us to the following position: empirical methods for characterizing problem difficulty need to differentiate the search space topologies of particular problem instances if we are to answer the open questions posed at the beginning of this paper. Ultimately, these methods would provide a mechanism to determine 1) the features of test problems inducing different search space topologies, 2) the topologies which cause difficulties for particular algorithms, and 3) why those topologies cause difficulties. Unfortunately, we found that existing methods could not achieve these goals, for reasons discussed below. We are not saying that existing empirical methods are somehow invalid. Indeed, they have provided valuable answers to several important research questions, but not for those which we are asking. Several immediate, practical benefits follow from the use of such empirical methods. First, they provide a more principled method for constructing benchmark test suites. Existing benchmarks (such as those for the JSP) are often formed by filtering for ’difficult’ problems, where difficulty is measured relative to a specific algorithm. A more principled approach to benchmark construction would involve inclusion of problems with demonstrably different search space topologies, as different topologies cause problems for different algorithms. Second, because such methods are sensitive to individual problem characteristics, we can dynamically select the algorithm most appropriate for a given problem. Without such a methodology, it is likely that a sub-optimal algorithm may be chosen for an arbitrary problem instance. Third, a deeper understanding of search space topologies sho
[1]
D. Wolpert,et al.
No Free Lunch Theorems for Search
,
1995
.
[2]
J. C. Jackson.
Constrainedness and the Phase Transition in Job Shop Scheduling
,
1997
.
[3]
Bernd Freisleben,et al.
Fitness Landscapes, Memetic Algorithms, and Greedy Operators for Graph Bipartitioning
,
2000,
Evolutionary Computation.
[4]
Toby Walsh,et al.
The TSP Phase Transition
,
1996,
Artif. Intell..
[5]
Bernard Manderick,et al.
The Genetic Algorithm and the Structure of the Fitness Landscape
,
1991,
ICGA.
[6]
L. Darrell Whitley,et al.
Algorithm Performance and Problem Structure for Flow-shop Scheduling
,
1999,
AAAI/IAAI.
[7]
Holger H. Hoos,et al.
Stochastic local search - methods, models, applications
,
1998,
DISKI.
[8]
J. Watson.
Problem Difficulty and Fitness Landscapes of Structured and Random Job-Shop Problems: What Do Existing Analysis Techniques Really Tell Us?
,
2000
.
[9]
Andrew B. Kahng,et al.
A new adaptive multi-start technique for combinatorial global optimizations
,
1994,
Oper. Res. Lett..
[10]
P. Stadler,et al.
The landscape of the traveling salesman problem
,
1992
.
[11]
E. Weinberger,et al.
Correlated and uncorrelated fitness landscapes and how to tell the difference
,
1990,
Biological Cybernetics.
[12]
Soraya B. Rana.
Examining the Role of Local Optima and Schema Processing in Genetic Search
,
1998
.
[13]
John N. Hooker,et al.
Testing heuristics: We have it all wrong
,
1995,
J. Heuristics.
[14]
Takeshi Yamada,et al.
Genetic Algorithms, Path Relinking, and the Flowshop Sequencing Problem
,
1998,
Evolutionary Computation.