论文信息 - PG3: Policy-Guided Planning for Generalized Policy Generation

PG3: Policy-Guided Planning for Generalized Policy Generation

A longstanding objective in classical planning is to synthesize policies that generalize across multiple problems from the same domain. In this work, we study generalized policy search-based methods with a focus on the score function used to guide the search over policies. We demonstrate limitations of two score functions and propose a new approach that overcomes these limitations. The main idea behind our approach, Policy-Guided Planning for Generalized Policy Generation (PG3), is that a candidate policy should be used to guide planning on training problems as a mechanism for evaluating that candidate. Theoretical results in a simplified setting give conditions under which PG3 is optimal or admissible. We then study a specific instantiation of policy search where planning problems are PDDL-based and policies are lifted decision lists. Empirical results in six domains confirm that PG3 learns generalized policies more efficiently and effectively than several baselines. Code: https://github.com/ryangpeixu/pg3

L. Kaelbling | Tomas Lozano-Perez | Tom Silver | Aidan Curtis | Ryan Yang

[1] Neil Immerman,et al. Directed Search for Generalized Plans Using Classical Planners , 2011, ICAPS.

[2] Leslie Pack Kaelbling,et al. Planning with Learned Object Importance in Large Problem Instances using Graph Neural Networks , 2020, AAAI.

[3] Roni Khardon,et al. Learning Action Strategies for Planning Domains , 1999, Artif. Intell..

[4] R. Lathe. Phd by thesis , 1988, Nature.

[5] Javier Segovia Aguas,et al. Generalized Planning as Heuristic Search , 2021, ICAPS.

[6] Sergey Levine,et al. Temporal Difference Models: Model-Free Deep RL for Model-Based Control , 2018, ICLR.

[7] Tom Silver,et al. PDDLGym: Gym Environments from PDDL Problems , 2020, ArXiv.

[8] John Fox,et al. The Knowledge Engineering Review , 1984, The Knowledge Engineering Review.

[9] Blai Bonet,et al. Policies that Generalize: Solving Many Planning Problems with the Same Policy , 2015, IJCAI.

[10] Pieter Abbeel,et al. Learning Generalized Reactive Policies using Deep Neural Networks , 2017, ICAPS.

[11] Nils J. Nilsson,et al. Artificial Intelligence , 1974, IFIP Congress.

[12] Hector J. Levesque,et al. Planning with Loops , 2005, IJCAI.

[13] Javier Segovia Aguas,et al. Computing Hierarchical Finite State Controllers With Classical Planning , 2018, J. Artif. Intell. Res..

[14] Erez Karpas,et al. Generalized Planning With Deep Reinforcement Learning , 2020, ArXiv.

[15] Yuxiao Hu,et al. Planning with Loops: Some New Results , 2009 .

[16] Sheila A. McIlraith,et al. Generalized Planning via Abstraction: Arbitrary Numbers of Objects , 2019, AAAI.

[17] F. Kessler. Projections , 2020, International Encyclopedia of Human Geography.

[18] Raymond J. Mooney,et al. Induction of First-Order Decision Lists: Results on Learning the Past Tense of English Verbs , 1995, J. Artif. Intell. Res..