论文信息 - Delfi: Online Planner Selection for Cost-Optimal Planning

Delfi: Online Planner Selection for Cost-Optimal Planning

Cost-optimal planning has not seen many successful approaches that work well across all domains. Some costoptimal planners excel on some domains, while exhibiting less exciting performance on others. For a particular domain, however, there is often a cost-optimal planner that works extremely well. For that reason, portfolio-based techniques have recently become popular. These either decide offline on a particular resource allocation scheme for a given collection of planners or try to perform an online classification of a given planning task to select a planner to be applied to solving the task at hand. Our planner Delfi is an online portfolio planner. In contrast to existing techniques, Delfi exploits deep learning techniques to learn a model that predicts which of the planners in the portfolio can solve a given planning task within the imposed time and memory bounds. Delfi uses graphical representations of a planning task which allows exploiting existing tools for image convolution. In this planner abstract, we describe the techniques used to create our portfolio planner. Introduction As planning is known to be computationally hard even for extremely conservative problem formalisms (Bylander 1994), no single planner should be expected to work well on all planning domains, or even on all tasks in a particular domain. As a result, research has not only focused on developing different planning techniques, such as improving search or heuristics, but also on exploiting multiple diverse approaches for solving planning tasks. One such a approach is to aggregate multiple planners in a portfolio (Seipp et al. 2012; Vallati 2012; Cenamor, de la Rosa, and Fernández 2013; Seipp et al. 2015), which is what we do in this work. Such portfolios are often sequential and defined by two decisions: (i) which planner of the available to run next, and (ii) for how long to run it until the next planner is selected. Furthermore, the portfolio-based approaches can be partitioned in those that make those decisions ahead of time, called offline portfolios (Helmert et al. 2011; Núñez, Borrajo, and Linares López 2014; Seipp, Sievers, and Hutter 2014a; 2014b; 2014c) and those that make these decisions per given input task, called online portfolios (Cenamor, de la Rosa, and Fernández 2014). Our planner, called Delfi for DEap Learning of PortFolIos, is an online portfolio planner submitted to optimal classical track of the International Planning Competition (IPC) 2018. It consists of (a) a collection of cost-optimal planners based on Fast Downward (Helmert 2006), and (b) a module that, given a planning task, selects the planner from the collection for which the confidence that it solves the given planning task is highest. Once selected, the planner is run on the given task for the entire available time. In the remainder of this planner abstract, we describe both components in detail. Collection of Cost-Optimal Planners The large literature on classical planning results in an extensive pool of available planning systems that we could in principle all use. However, there are a few aspects that guided our decision to collect a rather small subset of specific planners. Firstly, the task of integrating the diverse planners within one system able to run them all in the same setting is a big (technical) challenge, and evaluating all of these planners for the training phase of learning the model would be extremely time-consuming. Secondly, portfolio planners always suffer from clearly identifying their components that are primarily responsible for the good performance of the portfolio planner. Bearing in mind the first aspect, we restricted the pool of planners to those based on Fast Downward (Helmert 2006). This has the additional advantage that we also exploit how far a portfolio exclusively based on a single planning system fares. With respect to the second aspect, we excluded all recent (and state-of-the-art) planners that have not been evaluated in any previous competition. In particular, many of these planners are submitted independently to the IPC 2018. Furthermore, we mainly focused on planners with main components that we co-developed in order to primarily evaluate our own contributions. These considerations result in a collection of 17 planners for our portfolio planner Delfi. With the exception of SymBA∗ (Torralba et al. 2014), the winner of the IPC 2014, included as-is in our collection of planners, all planners are based on a recent version of Fast Downward. These 16 planners use A∗ search (Hart, Nilsson, and Raphael 1968) and differ in the subsets of the following additional components they use. Please refer to the Appendix for the complete list of planner configurations of our collection, which is identical for both variants of Delfi. • Pruning based on partial order reduction using strong stubborn sets (Wehrle and Helmert 2014). Delfi uses the implementation of strong stubborn sets available in Fast Downward, which is based on the original implementation of Alkhazraji et al. (2012) and Wehrle and Helmert (2012) that has also been used in Metis 2014 (Alkhazraji et al. 2014). However, the current implementation has been improved in terms of efficiency since its original development.1 To support conditional effects, we extended the implementation in the same way as in Metis 2014. We also use the same mechanism that disables pruning after the first 1000 expansions if only 10% or fewer states have been pruned at this point. This component is part of all 16 planners. • Pruning based on structural symmetries (Shleyfman et al. 2015) using DKS (Domshlak, Katz, and Shleyfman 2012) or orbit space search (OSS) (Domshlak, Katz, and Shleyfman 2015). We extended the original implementation of problem description graphs, also called symmetry graphs, which serve as basis for computing symmetries, to support conditional effects. Sievers et al. (2017) recently formally defined this extension in the context of structural symmetries of lifted representations. Out of the 16 planners, 8 use DKS search and the other 8 use OSS, without any other further difference except that merge-andshrink configurations with OSS need to disable pruning of unreachable states to avoid incorrectly reporting pruned states as dead ends (cf. Sievers et al., 2015, for more details). • Admissible heuristics: – The blind heuristic. – The LM-cut heuristic (Helmert and Domshlak 2009). To support conditional effects, we implemented a variant of the LM-cut heuristic that considers effect conditions in the same way as Metis 2014 (Alkhazraji et al. 2014) does. However, we refrain from choosing the regular LM-cut heuristic or the variant that supports conditional effects depending on the requirements of the input planning task, and instead always use the latter implementation that comes with a small overhead due to the need for different data structures. – The canonical pattern database (CPDB) heuristic with hillclimbing (HC) to compute pattern collections, also referred to as iPDB in the literature (Haslum et al. 2007). We add a time limit of 900s to the hillclimbing algorithm and denote the planner by HC-CPDB. – The zero-one cost partitioning pattern database (ZOPDB) heuristic with a genetic algorithm (GA) to compute pattern collections (Edelkamp 2006). We call the planner GA-ZOPDB. – Four variants of the merge-and-shrink heuristic (Dräger, Finkbeiner, and Podelski 2009; Helmert et al. 2014; Sievers 2017). Three of them use the stateof-the-art shrink strategy based on bisimulation (NisSee http://issues.fast-downward.org/ issue499 and http://issues.fast-downward. org/issue628. sim, Hoffmann, and Helmert 2011) with a size limit of 50000 states on transition systems, always allowing (perfect) shrinking, called B. The fourth variant uses a greedy variant of B, called G, not imposing any size limit on transition systems, and also always allowing shrinking. All configurations use full pruning (Sievers 2017), i.e., always prune both unreachable and irrelevant states, unless combined with OSS as discussed above, in which case pruning of unreachable states is disabled. We perform exact label reductions based on Θ-combinability (Sievers, Wehrle, and Helmert 2014) with a fixed point algorithm using a random order on factors. Finally, all variants use a time limit of 900s for computing the heuristic, which leads to computing so-called partial merge-and-shrink abstractions that do not cover all variables of the task whenever the time limit is hit. In these cases, we pick one of the remaining induced heuristics according to the following rule of thumb: we prefer the heuristic with the largest estimate for the initial state (rationale: better informed heuristic), breaking ties in favor of larger factors (rationale: more fine-grained abstraction), and choose a random heuristic among all remaining candidates of equal preference. For more details on this, we refer to the paper introducing partial abstractions (Sievers 2018b) and the separate competition entry called Fast Downward Merge-and-Shrink (Sievers 2018a) which uses the same merge-and-shrink configurations as our portfolio. The remaining difference between the four variants is the merge strategy, which finally results in the following merge-and-shrink configurations: ∗ B-SCCdfp: the state-of-the-art merge strategy based on strongly connected components of the causal graph (Sievers, Wehrle, and Helmert 2016), which uses DFP (Sievers, Wehrle, and Helmert 2014) for internal merging. ∗ B-MIASMdfp: the entirely precomputed merge strategy maximum intermediate abstraction size minimizing (Fan, Müller, and Holte 2014), which uses DFP as a fallback mechanism. ∗ B-sbMIASM (previously also called DYN-MIASM): the merge strategy score-based MIASM (Sievers, Wehrle, and Helmert 2016), which is a simple variant of MIASM. ∗ G-SCCdfp: as SCCdfp, but with the greedy variant of bisimulation-based shrinking. As mentioned above, each heuristic is used in two planner

Shirin Sohrabi | Horst Samulowitz | Michael Katz

[1] Malte Helmert,et al. About Partial Order Reduction in Planning and Computer Aided Verification , 2012, ICAPS.

[2] Jendrik Seipp,et al. Learning Portfolios of Automatically Tuned Planners , 2012, ICAPS.

[3] Jana Koehler. Handling of Conditional Effects and Negative Goals in IPP , 1999 .

[4] Nils J. Nilsson,et al. A Formal Basis for the Heuristic Determination of Minimum Cost Paths , 1968, IEEE Trans. Syst. Sci. Cybern..

[5] Blai Bonet,et al. Automatic Derivation of Memoryless Policies and Finite-State Controllers Using Classical Planners , 2009, ICAPS.

[6] Malte Helmert,et al. An Analysis of Merge Strategies for Merge-and-Shrink Heuristics , 2016, ICAPS.

[7] Silvan Sievers. Fast Downward Cedalion , 2014 .

[8] Reuven Y. Rubinstein,et al. Optimization of computer simulation models with rare events , 1997 .

[9] Álvaro Torralba,et al. A Reminder about the Importance of Computing and Exploiting Invariants in Planning , 2015, ICAPS.

[10] Jendrik Seipp,et al. Automatic Configuration of Sequential Planning Portfolios , 2015, AAAI.

[11] Jörg Hoffmann,et al. Fast Downward Stone Soup , 2011 .