PG3: Policy-Guided Planning for Generalized Policy Generation

A longstanding objective in classical planning is to synthesize policies that generalize across multiple problems from the same domain. In this work, we study generalized policy search-based methods with a focus on the score function used to guide the search over policies. We demonstrate limitations of two score functions and propose a new approach that overcomes these limitations. The main idea behind our approach, Policy-Guided Planning for Generalized Policy Generation (PG3), is that a candidate policy should be used to guide planning on training problems as a mechanism for evaluating that candidate. Theoretical results in a simplified setting give conditions under which PG3 is optimal or admissible. We then study a specific instantiation of policy search where planning problems are PDDL-based and policies are lifted decision lists. Empirical results in six domains confirm that PG3 learns generalized policies more efficiently and effectively than several baselines. Code: https://github.com/ryangpeixu/pg3

[1]  Neil Immerman,et al.  Directed Search for Generalized Plans Using Classical Planners , 2011, ICAPS.

[2]  Leslie Pack Kaelbling,et al.  Planning with Learned Object Importance in Large Problem Instances using Graph Neural Networks , 2020, AAAI.

[3]  Roni Khardon,et al.  Learning Action Strategies for Planning Domains , 1999, Artif. Intell..

[4]  R. Lathe Phd by thesis , 1988, Nature.

[5]  Javier Segovia Aguas,et al.  Generalized Planning as Heuristic Search , 2021, ICAPS.

[6]  Sergey Levine,et al.  Temporal Difference Models: Model-Free Deep RL for Model-Based Control , 2018, ICLR.

[7]  Tom Silver,et al.  PDDLGym: Gym Environments from PDDL Problems , 2020, ArXiv.

[8]  John Fox,et al.  The Knowledge Engineering Review , 1984, The Knowledge Engineering Review.

[9]  Blai Bonet,et al.  Policies that Generalize: Solving Many Planning Problems with the Same Policy , 2015, IJCAI.

[10]  Pieter Abbeel,et al.  Learning Generalized Reactive Policies using Deep Neural Networks , 2017, ICAPS.

[11]  Nils J. Nilsson,et al.  Artificial Intelligence , 1974, IFIP Congress.

[12]  Hector J. Levesque,et al.  Planning with Loops , 2005, IJCAI.

[13]  Javier Segovia Aguas,et al.  Computing Hierarchical Finite State Controllers With Classical Planning , 2018, J. Artif. Intell. Res..

[14]  Erez Karpas,et al.  Generalized Planning With Deep Reinforcement Learning , 2020, ArXiv.

[15]  Yuxiao Hu,et al.  Planning with Loops: Some New Results , 2009 .

[16]  Sheila A. McIlraith,et al.  Generalized Planning via Abstraction: Arbitrary Numbers of Objects , 2019, AAAI.

[17]  F. Kessler Projections , 2020, International Encyclopedia of Human Geography.

[18]  Raymond J. Mooney,et al.  Induction of First-Order Decision Lists: Results on Learning the Past Tense of English Verbs , 1995, J. Artif. Intell. Res..