Expanding Motor Skills using Relay Networks

While recent advances in deep reinforcement learning have achieved impressive results in learning motor skills, many policies are only capable within a limited set of initial states. We propose an algorithm that sequentially decomposes a complex robotic task into simpler subtasks and trains a local policy for each subtask such that the robot can expand its existing skill set gradually. Our key idea is to build a directed graph of local control policies represented by neural networks, which we refer to as relay neural networks. Starting from the first policy that attempts to achieve the task from a small set of initial states, the algorithm iteratively discovers the next subtask with increasingly more difficult initial states until the last subtask matches the initial state distribution of the original task. The policy of each subtask aims to drive the robot to a state where the policy of its preceding subtask is able to handle. By taking advantage of many existing actorcritic style policy search algorithms, we utilize the optimized value function to define “good states” for the next policy to relay to.

[1]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[2]  Philippe Beaudoin,et al.  Robust task-based control policies for physics-based characters , 2009, SIGGRAPH 2009.

[3]  Joshua B. Tenenbaum,et al.  Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation , 2016, NIPS.

[4]  John Langford,et al.  Approximately Optimal Approximate Reinforcement Learning , 2002, ICML.

[5]  Sergey Levine,et al.  High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.

[6]  Jan Peters,et al.  Hierarchical Relative Entropy Policy Search , 2014, AISTATS.

[7]  Sven Behnke,et al.  Bayesian exploration and interactive demonstration in continuous state MAXQ-learning , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[8]  Yuval Tassa,et al.  Learning and Transfer of Modulated Locomotor Controllers , 2016, ArXiv.

[9]  Yuval Tassa,et al.  Data-efficient Deep Reinforcement Learning for Dexterous Manipulation , 2017, ArXiv.

[10]  Daniel E. Koditschek,et al.  Sequential Composition of Dynamically Dexterous Robot Behaviors , 1999, Int. J. Robotics Res..

[11]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[12]  Feng Wu,et al.  Online planning for large MDPs with MAXQ decomposition , 2012, AAMAS.

[13]  Pieter Abbeel,et al.  Reverse Curriculum Generation for Reinforcement Learning , 2017, CoRL.

[14]  Glen Berseth,et al.  Terrain-adaptive locomotion skills using deep reinforcement learning , 2016, ACM Trans. Graph..

[15]  Eugene Fiume,et al.  Domain of attraction expansion for physics-based character control , 2017, TOGS.

[16]  Thomas G. Dietterich Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[17]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[18]  Sridhar Mahadevan,et al.  Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..

[19]  Andrew G. Barto,et al.  Skill Discovery in Continuous Reinforcement Learning Domains using Skill Chaining , 2009, NIPS.

[20]  Glen Berseth,et al.  DeepLoco , 2017, ACM Trans. Graph..

[21]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[22]  Guy Lever,et al.  Deterministic Policy Gradient Algorithms , 2014, ICML.

[23]  Russ Tedrake,et al.  LQR-trees: Feedback motion planning on sparse randomized trees , 2009, Robotics: Science and Systems.

[24]  Jörg Stückler,et al.  Getting Back on Two Feet: Reliable Standing-up Routines for a Humanoid Robot , 2006, IAS.

[25]  R. J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[26]  M Vukobratović,et al.  On the stability of biped locomotion. , 1970, IEEE transactions on bio-medical engineering.

[27]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[28]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[29]  Sehoon Ha,et al.  Learning a unified control policy for safe falling , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).