Learning Heuristic Selection with Dynamic Algorithm Configuration

A key challenge in satisfying planning is to use multiple heuristics within one heuristic search. An aggregation of multiple heuristic estimates, for example by taking the maximum, has the disadvantage that bad estimates of a single heuristic can negatively affect the whole search. Since the performance of a heuristic varies from instance to instance, approaches such as algorithm selection can be successfully applied. In addition, alternating between multiple heuristics during the search makes it possible to use all heuristics equally and improve performance. However, all these approaches ignore the internal search dynamics of a planning system, which can help to select the most helpful heuristics for the current expansion step. We show that dynamic algorithm configuration can be used for dynamic heuristic selection which takes into account the internal search dynamics of a planning system. Furthermore, we prove that this approach generalizes over existing approaches and that it can exponentially improve the performance of the heuristic search. To learn dynamic heuristic selection, we propose an approach based on reinforcement learning and show empirically that domain-wise learned policies, which take the internal search dynamics of a planning system into account, can exceed existing approaches in terms of coverage.

[1]  Bernhard Nebel,et al.  The FF Planning System: Fast Plan Generation Through Heuristic Search , 2011, J. Artif. Intell. Res..

[2]  Takuya Akiba,et al.  Chainer: A Deep Learning Framework for Accelerating the Research Cycle , 2019, KDD.

[3]  R. Geoff Dromey,et al.  An algorithm for the selection problem , 1986, Softw. Pract. Exp..

[4]  Hector Geffner,et al.  Unifying the Causal Graph and Additive Heuristics , 2008, ICAPS.

[5]  Jendrik Seipp,et al.  Learning Portfolios of Automatically Tuned Planners , 2012, ICAPS.

[6]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[7]  Wheeler Ruml,et al.  Learning Inadmissible Heuristics During Search , 2011, ICAPS.

[8]  Bart Selman,et al.  Planning as Satisfiability , 1992, ECAI.

[9]  Blai Bonet,et al.  Planning as heuristic search , 2001, Artif. Intell..

[10]  Malte Helmert,et al.  Preferred Operators and Deferred Evaluation in Satisficing Planning , 2009, ICAPS.

[11]  Stefan Edelkamp,et al.  Efficient symbolic search for cost-optimal planning , 2017, Artif. Intell..

[12]  Carlos Ansótegui,et al.  A Gender-Based Genetic Algorithm for the Automatic Configuration of Algorithms , 2009, CP.

[13]  Bernhard Nebel,et al.  COMPLEXITY RESULTS FOR SAS+ PLANNING , 1995, Comput. Intell..

[14]  Thomas Jansen,et al.  Optimization with randomized search heuristics - the (A)NFL theorem, realistic scenarios, and difficult functions , 2002, Theor. Comput. Sci..

[15]  Maxim Likhachev,et al.  Search Portfolio with Sharing , 2016, ICAPS.

[16]  C. Fawcett,et al.  FD-Autotune: Automated Configuration of Fast Downward , 2011 .

[17]  Dalal Alrajeh,et al.  Learning Classical Planning Strategies with Policy Gradient , 2019, ICAPS.

[18]  Leslie Pérez Cáceres,et al.  The irace package: Iterated racing for automatic algorithm configuration , 2016 .

[19]  Fernando Fernández,et al.  The IBaCoP Planning System: Instance-Based Configured Portfolios , 2016, J. Artif. Intell. Res..

[20]  Florian Geißer,et al.  Symbolic Planning with Edge-Valued Multi-Valued Decision Diagrams , 2018, ICAPS.

[21]  M. Helmert,et al.  Fast Downward Stone Soup : A Baseline for Building Planner Portfolios , 2011 .

[22]  Antonio Bucchiarone,et al.  Learning Neural Search Policies for Classical Planning , 2020, ICAPS.

[23]  Judea Pearl,et al.  Heuristics : intelligent search strategies for computer problem solving , 1984 .

[24]  Malte Helmert,et al.  Neural Network Heuristics for Classical Planning: A Study of Hyperparameter Space , 2020, ECAI.

[25]  Sandra Zilles,et al.  Learning heuristic functions for large state spaces , 2011, Artif. Intell..

[26]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[27]  Jussi Rintanen,et al.  Planning as satisfiability: Heuristics , 2012, Artif. Intell..

[28]  Manfred Huber,et al.  Dynamic heuristic planner selection , 2016, 2016 IEEE International Conference on Systems, Man, and Cybernetics (SMC).

[29]  Ronald P. A. Petrick,et al.  Learning heuristic functions for cost-based planning , 2013 .

[30]  Malte Helmert,et al.  The Fast Downward Planning System , 2006, J. Artif. Intell. Res..

[31]  Malte Helmert,et al.  A Planning Heuristic Based on Causal Graph Analysis , 2004, ICAPS.

[32]  D. Wolpert,et al.  No Free Lunch Theorems for Search , 1995 .

[33]  Marius Lindauer,et al.  Dynamic Algorithm Configuration: Foundation of a New Meta-Algorithmic Framework , 2020, ECAI.

[34]  Jie Chen,et al.  Adaptive Planner Scheduling with Graph Neural Networks , 2018, ArXiv.

[35]  David Silver,et al.  Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[36]  Marius Thomas Lindauer,et al.  The Configurable SAT Solver Challenge (CSSC) , 2015, Artif. Intell..

[37]  Shirin Sohrabi,et al.  Deep Learning for Cost-Optimal Planning: Task-Dependent Planner Selection , 2019, AAAI.

[38]  Kevin Leyton-Brown,et al.  Improved Features for Runtime Prediction of Domain-Independent Planners , 2014, ICAPS.

[39]  Malte Helmert,et al.  How Good is Almost Perfect? , 2008, AAAI.

[40]  Malte Helmert,et al.  The More, the Merrier: Combining Heuristic Estimators for Satisficing Planning , 2010, ICAPS.

[41]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[42]  Tsuyoshi Murata,et al.  {m , 1934, ACML.

[43]  Jendrik Seipp,et al.  Automatic Configuration of Sequential Planning Portfolios , 2015, AAAI.

[44]  Erez Karpas,et al.  To Max or Not to Max: Online Learning for Speeding Up Optimal Planning , 2010, AAAI.

[45]  Silvan Sievers Fast Downward Cedalion , 2014 .