Planning with Abstract Markov Decision Processes

Robots acting in human-scale environments must plan under uncertainty in large state–action spaces and face constantly changing reward functions as requirements and goals change. Planning under uncertainty in large state–action spaces requires hierarchical abstraction for efficient computation. We introduce a new hierarchical planning framework called Abstract Markov Decision Processes (AMDPs) that can plan in a fraction of the time needed for complex decision making in ordinary MDPs. AMDPs provide abstract states, actions, and transition dynamics in multiple layers above a base-level “flat” MDP. AMDPs decompose problems into a series of subtasks with both local reward and local transition functions used to create policies for subtasks. The resulting hierarchical planning method is independently optimal at each level of abstraction, and is recursively optimal when the local reward and transition functions are correct. We present empirical results showing significantly improved planning speed, while maintaining solution quality, in the Taxi domain and in a mobile-manipulation robotics problem. Furthermore, our approach allows specification of a decision-making model for a mobile-manipulation problem on a Turtlebot, spanning from low-level control actions operating on continuous variables all the way up through high-level object manipulation tasks.

[1]  Morgan Quigley,et al.  ROS: an open-source Robot Operating System , 2009, ICRA 2009.

[2]  Smaranda Muresan,et al.  Grounding English Commands to Reward Functions , 2015, Robotics: Science and Systems.

[3]  George Konidaris,et al.  Constructing Abstraction Hierarchies Using a Skill-Symbol Loop , 2015, IJCAI.

[4]  Richard S. Sutton,et al.  Roles of Macro-Actions in Accelerating Reinforcement Learning , 1998 .

[5]  Peter Stone,et al.  The utility of temporal abstraction in reinforcement learning , 2008, AAMAS.

[6]  Ben J. A. Kröse,et al.  Hierarchical dynamic programming for robot path planning , 2005, 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[7]  Nils J. Nilsson,et al.  A Formal Basis for the Heuristic Determination of Minimum Cost Paths , 1968, IEEE Trans. Syst. Sci. Cybern..

[8]  Michael L. Littman,et al.  A hierarchical approach to efficient reinforcement learning in deterministic domains , 2006, AAMAS '06.

[9]  Andre Cohen,et al.  An object-oriented representation for efficient reinforcement learning , 2008, ICML '08.

[10]  Stuart J. Russell,et al.  Angelic Hierarchical Planning: Optimal and Online Algorithms , 2008, ICAPS.

[11]  R. Bellman A Markovian Decision Process , 1957 .

[12]  Peter Stone,et al.  Hierarchical model-based reinforcement learning: R-max + MAXQ , 2008, ICML '08.

[13]  Lihong Li,et al.  PAC-inspired Option Discovery in Lifelong Reinforcement Learning , 2014, ICML.

[14]  S. LaValle,et al.  Randomized Kinodynamic Planning , 2001 .

[15]  Pat Langley,et al.  Learning hierarchical task networks by observation , 2006, ICML.

[16]  Leslie Pack Kaelbling,et al.  DetH*: Approximate Hierarchical Solution of Large Markov Decision Processes , 2011, IJCAI.

[17]  Ross A. Knepper,et al.  Single assembly robot in search of human partner: Versatile grounded language generation , 2013, 2013 8th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[18]  Thomas G. Dietterich Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[19]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[20]  Geoffrey J. Gordon,et al.  Bounded real-time dynamic programming: RTDP with monotone upper bounds and performance guarantees , 2005, ICML.

[21]  Stefanie Tellex,et al.  Interpreting and Executing Recipes with a Cooking Robot , 2012, ISER.

[22]  R Bellman,et al.  DYNAMIC PROGRAMMING AND LAGRANGE MULTIPLIERS. , 1956, Proceedings of the National Academy of Sciences of the United States of America.