An Efficient Approach to Model-Based Hierarchical Reinforcement Learning

We propose a model-based approach to hierarchical reinforcement learning that exploits shared knowledge and selective execution at different levels of abstraction, to efficiently solve large, complex problems. Our framework adopts a new transition dynamics learning algorithm that identifies the common action-feature combinations of the subtasks, and evaluates the subtask execution choices through simulation. The framework is sample efficient, and tolerates uncertain and incomplete problem characterization of the subtasks. We test the framework on common benchmark problems and complex simulated robotic environments. It compares favorably against the stateof-the-art algorithms, and scales well in very large problems.

[1]  Stuart J. Russell,et al.  Reinforcement Learning with Hierarchies of Machines , 1997, NIPS.

[2]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[3]  Marc Toussaint,et al.  Hierarchical Monte-Carlo Planning , 2015, AAAI.

[4]  Ronen I. Brafman,et al.  R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..

[5]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[6]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[7]  Sungyoung Lee,et al.  Approximate planning for bayesian hierarchical reinforcement learning , 2014, Applied Intelligence.

[8]  Andre Cohen,et al.  An object-oriented representation for efficient reinforcement learning , 2008, ICML '08.

[9]  Shie Mannor,et al.  Time-Regularized Interrupting Options (TRIO) , 2014, ICML.

[10]  Peter Stone,et al.  Hierarchical model-based reinforcement learning: R-max + MAXQ , 2008, ICML '08.

[11]  Andrew G. Barto,et al.  Skill Discovery in Continuous Reinforcement Learning Domains using Skill Chaining , 2009, NIPS.

[12]  Feng Cao,et al.  Bayesian Hierarchical Reinforcement Learning , 2012, NIPS.

[13]  Alexander L. Strehl,et al.  Probably Approximately Correct (PAC) Exploration in Reinforcement Learning , 2008, ISAIM.

[14]  Lihong Li,et al.  PAC-inspired Option Discovery in Lifelong Reinforcement Learning , 2014, ICML.

[15]  David Silver,et al.  Compositional Planning Using Optimal Option Models , 2012, ICML.

[16]  Wolfgang Ertel,et al.  Monte carlo bayesian hierarchical reinforcement learning , 2014, AAMAS.

[17]  Nahum Shimkin,et al.  Unified Inter and Intra Options Learning Using Policy Gradient Methods , 2011, EWRL.

[18]  Stuart J. Russell,et al.  Markovian State and Action Abstractions for MDPs via Hierarchical MCTS , 2016, IJCAI.

[19]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[20]  Thomas G. Dietterich The MAXQ Method for Hierarchical Reinforcement Learning , 1998, ICML.

[21]  Olivier Michel,et al.  Cyberbotics Ltd. Webots™: Professional Mobile Robot Simulation , 2004, ArXiv.