Policies that Generalize: Solving Many Planning Problems with the Same Policy

We establish conditions under which memoryless policies and finite-state controllers that solve one partially observable non-deterministic problem (PONDP) generalize to other problems; namely, problems that have a similar structure and share the same action and observation space. This is relevant to generalized planning where plans that work for many problems are sought, and to transfer learning where knowledge gained in the solution of one problem is to be used on related problems. We use a logical setting where uncertainty is represented by sets of states and the goal is to be achieved with certainty. While this gives us crisp notions of solution policies and generalization, the account also applies to probabilistic PONDs, i.e., Goal POMDPs.

[1]  Yuxiao Hu,et al.  A Correctness Result for Reasoning about One-Dimensional Planning Problems , 2010, IJCAI.

[2]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[3]  Yuxiao Hu,et al.  Generalized Planning: Synthesizing Plans that Work for Multiple Environments , 2011, IJCAI.

[4]  Yuxiao Hu,et al.  A Generic Technique for Synthesizing Bounded Finite-State Controllers , 2013, ICAPS.

[5]  Peter Stone,et al.  Transfer Learning for Reinforcement Learning Domains: A Survey , 2009, J. Mach. Learn. Res..

[6]  Blai Bonet,et al.  Solving POMDPs: RTDP-Bel vs. Point-based Algorithms , 2009, IJCAI.

[7]  Blai Bonet,et al.  Automatic Derivation of Memoryless Policies and Finite-State Controllers Using Classical Planners , 2009, ICAPS.

[8]  Neil Immerman,et al.  Qualitative Numeric Planning , 2011, AAAI.

[9]  Rajesh P. N. Rao,et al.  Embodiment is the foundation, not a level , 1996, Behavioral and Brain Sciences.

[10]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[11]  Blai Bonet,et al.  A Concise Introduction to Models and Methods for Automated Planning , 2013, A Concise Introduction to Models and Methods for Automated Planning.

[12]  Robin R. Murphy,et al.  Introduction to AI Robotics , 2000 .

[13]  Thomas J. Walsh,et al.  Towards a Unified Theory of State Abstraction for MDPs , 2006, AI&M.

[14]  David Chapman,et al.  Penguins Can Make Cake , 1989, AI Mag..

[15]  Krishnendu Chatterjee,et al.  POMDPs under probabilistic semantics , 2013, Artif. Intell..

[16]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[17]  Mat Buckland,et al.  Programming Game AI by Example , 2004 .

[18]  Neil Immerman,et al.  A new representation and associated algorithms for generalized planning , 2011, Artif. Intell..

[19]  Andrew G. Barto,et al.  Building Portable Options: Skill Transfer in Reinforcement Learning , 2007, IJCAI.

[20]  Paolo Traverso,et al.  Automatic OBDD-Based Generation of Universal Plans in Non-Deterministic Domains , 1998, AAAI/IAAI.

[21]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..