Using Learned Policies in Heuristic-Search Planning

Many current state-of-the-art planners rely on forward heuristic search. The success of such search typically depends on heuristic distance-to-the-goal estimates derived from the plangraph. Such estimates are effective in guiding search for many domains, but there remain many other domains where current heuristics are inadequate to guide forward search effectively. In some of these domains, it is possible to learn reactive policies from example plans that solve many problems. However, due to the inductive nature of these learning techniques, the policies are often faulty, and fail to achieve high success rates. In this work, we consider how to effectively integrate imperfect learned policies with imperfect heuristics in order to improve over each alone. We propose a simple approach that uses the policy to augment the states expanded during each search step. In particular, during each search node expansion, we add not only its neighbors, but all the nodes along the trajectory followed by the policy from the node until some horizon. Empirical results show that our proposed approach benefits both of the leveraged automated techniques, learning and heuristic search, outperforming the state-of-the-art in most benchmark planning domains.

[1]  Vincent Vidal,et al.  A Lookahead Strategy for Heuristic Search Planning , 2004, ICAPS.

[2]  Benjamin Van Roy,et al.  Solitaire: Man Versus Machine , 2004, NIPS.

[3]  Andrew Coles,et al.  Marvin: macro-actions from reduced versions of the instance , 2004 .

[4]  Robert Givan,et al.  Inductive Policy Selection for First-Order MDPs , 2002, UAI.

[5]  Robert Givan,et al.  Taxonomic syntax for first order inference , 1989, JACM.

[6]  Robert Givan,et al.  Learning Heuristic Functions from Relaxed Plans , 2006, ICAPS.

[7]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[8]  Hector Geffner,et al.  Learning Generalized Policies in Planning Using Concept Languages , 2000, KR.

[9]  Roni Khardon,et al.  Learning Action Strategies for Planning Domains , 1999, Artif. Intell..

[10]  Hector Geffner Planning Graphs and Knowledge Compilation , 2004, ICAPS.

[11]  Ronald L. Rivest,et al.  Learning decision lists , 2004, Machine Learning.

[12]  Robert Givan,et al.  Learning Domain-Specific Control Knowledge from Random Walks , 2004, ICAPS.

[13]  Ivan Serina,et al.  Planning Through Stochastic Local Search and Temporal Action Graphs in LPG , 2003, J. Artif. Intell. Res..

[14]  Bernhard Nebel,et al.  The FF Planning System: Fast Plan Generation Through Heuristic Search , 2011, J. Artif. Intell. Res..

[15]  Robert Givan,et al.  Learning Measures of Progress for Planning Domains , 2005, AAAI.

[16]  Matthew L. Ginsberg,et al.  Limited Discrepancy Search , 1995, IJCAI.