Planning with Partially Specified Behaviors

In this paper we present a framework called PPSB for combining reinforcement learning and planning to solve sequential decision problems. Our aim is to show that reinforcement learning and planning complement each other well, in that each can take advantage of the strengths of the other. PPSB uses partial action specifications to decompose sequential decision problems into tasks that serve as an interface between reinforcement learning and planning. On the bottom level, we use reinforcement learning to compute policies for achieving each individual task. On the top level, we use planning to produce a sequence of tasks that achieves an overall goal. Experiments show that our framework is competitive with realistic environments where a robot has to perform some tasks.

[1]  Luca Iocchi,et al.  Automatic Generation and Learning of Finite-State Controllers , 2012, AIMSA.

[2]  Manuela M. Veloso,et al.  Fault Tolerant Planning: Toward Probabilistic Uncertainty Models in Symbolic Non-Deterministic Planning , 2004, ICAPS.

[3]  Blai Bonet,et al.  A Concise Introduction to Models and Methods for Automated Planning , 2013, A Concise Introduction to Models and Methods for Automated Planning.

[4]  David Andre,et al.  State abstraction for programmable reinforcement learning agents , 2002, AAAI/IAAI.

[5]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[6]  Richard Fikes,et al.  STRIPS: A New Approach to the Application of Theorem Proving to Problem Solving , 1971, IJCAI.

[7]  Andrew G. Barto,et al.  Automated State Abstraction for Options using the U-Tree Algorithm , 2000, NIPS.

[8]  Stuart J. Russell,et al.  Reinforcement Learning with Hierarchies of Machines , 1997, NIPS.

[9]  Christian J. Muise,et al.  Improved Non-Deterministic Planning by Exploiting State Relevance , 2012, ICAPS.

[10]  Thomas G. Dietterich Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[11]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[12]  Stuart J. Russell,et al.  Combined Task and Motion Planning for Mobile Manipulation , 2010, ICAPS.

[13]  Leslie Pack Kaelbling,et al.  Hierarchical task and motion planning in the now , 2011, 2011 IEEE International Conference on Robotics and Automation.

[14]  Thomas G. Dietterich State Abstraction in MAXQ Hierarchical Reinforcement Learning , 1999, NIPS.

[15]  Peter Gregory,et al.  Domain Model Acquisition in Domains with Action Costs , 2016, ICAPS.

[16]  Daniel Kudenko,et al.  Combining Reinforcement Learning with Symbolic Planning , 2007, Adaptive Agents and Multi-Agents Systems.

[17]  James A. Hendler,et al.  HTN Planning: Complexity and Expressivity , 1994, AAAI.

[18]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[19]  Paolo Traverso,et al.  Automated Planning: Theory & Practice , 2004 .

[20]  Andrew G. Barto,et al.  Building Portable Options: Skill Transfer in Reinforcement Learning , 2007, IJCAI.

[21]  Hector Geffner,et al.  Width and Serialization of Classical Planning Problems , 2012, ECAI.