From Pixels to Legs: Hierarchical Learning of Quadruped Locomotion

Legged robots navigating crowded scenes and complex terrains in the real world are required to execute dynamic leg movements while processing visual input for obstacle avoidance and path planning. We show that a quadruped robot can acquire both of these skills by means of hierarchical reinforcement learning (HRL). By virtue of their hierarchical structure, our policies learn to implicitly break down this joint problem by concurrently learning High Level (HL) and Low Level (LL) neural network policies. These two levels are connected by a low dimensional hidden layer, which we call latent command. HL receives a first-person camera view, whereas LL receives the latent command from HL and the robot's on-board sensors to control its actuators. We train policies to walk in two different environments: a curved cliff and a maze. We show that hierarchical policies can concurrently learn to locomote and navigate in these environments, and show they are more efficient than non-hierarchical neural network policies. This architecture also allows for knowledge reuse across tasks. LL networks trained on one task can be transferred to a new task in a new environment. Finally HL, which processes camera images, can be evaluated at much lower and varying frequencies compared to LL, thus reducing computation times and bandwidth requirements.

[1]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[2]  Sergey Levine,et al.  Self-Consistent Trajectory Autoencoder: Hierarchical Reinforcement Learning with Trajectory Embeddings , 2018, ICML.

[3]  Benjamin Recht,et al.  Simple random search of static linear policies is competitive for reinforcement learning , 2018, NeurIPS.

[4]  Navdeep Jaitly,et al.  Robotic Table Tennis with Model-Free Reinforcement Learning , 2020, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[5]  Sergey Levine,et al.  End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[6]  Darwin G. Caldwell,et al.  Vision enhanced reactive locomotion control for trotting on rough terrain , 2013, 2013 IEEE Conference on Technologies for Practical Robot Applications (TePRA).

[7]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[8]  Akshara Rai,et al.  Learning Generalizable Locomotion Skills with Hierarchical Reinforcement Learning , 2019, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[9]  Stuart J. Russell,et al.  Reinforcement Learning with Hierarchies of Machines , 1997, NIPS.

[10]  Sergey Levine,et al.  Data-Efficient Hierarchical Reinforcement Learning , 2018, NeurIPS.

[11]  Oliver Kroemer,et al.  Learning sequential motor tasks , 2013, 2013 IEEE International Conference on Robotics and Automation.

[12]  Thomas G. Dietterich Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[13]  Atil Iscen,et al.  Policies Modulating Trajectory Generators , 2018, CoRL.

[14]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[15]  Sergey Levine,et al.  QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation , 2018, CoRL.

[16]  Glen Berseth,et al.  DeepLoco , 2017, ACM Trans. Graph..

[17]  D. Sculley,et al.  Google Vizier: A Service for Black-Box Optimization , 2017, KDD.

[18]  Ion Stoica,et al.  Multi-Level Discovery of Deep Options , 2017, ArXiv.

[19]  Xi Chen,et al.  Evolution Strategies as a Scalable Alternative to Reinforcement Learning , 2017, ArXiv.

[20]  Wenbo Gao,et al.  Reinforcement Learning with Chromatic Networks , 2019, ArXiv.

[21]  Atil Iscen,et al.  Hierarchical Reinforcement Learning for Quadruped Locomotion , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[22]  Joonho Lee,et al.  Learning agile and dynamic motor skills for legged robots , 2019, Science Robotics.

[23]  Jitendra Malik,et al.  Gibson Env: Real-World Perception for Embodied Agents , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[24]  Atil Iscen,et al.  Provably Robust Blackbox Optimization for Reinforcement Learning , 2019, CoRL.

[25]  Pieter Abbeel,et al.  Stochastic Neural Networks for Hierarchical Reinforcement Learning , 2016, ICLR.

[26]  Sehoon Ha,et al.  Zero-shot Imitation Learning from Demonstrations for Legged Robot Visual Navigation , 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[27]  Sergey Levine,et al.  MCP: Learning Composable Hierarchical Control with Multiplicative Compositional Policies , 2019, NeurIPS.

[28]  Doina Precup,et al.  The Option-Critic Architecture , 2016, AAAI.

[29]  Kate Saenko,et al.  Hierarchical Reinforcement Learning with Hindsight , 2018, ArXiv.

[30]  Sergey Levine,et al.  Collective robot reinforcement learning with distributed asynchronous guided policy search , 2016, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[31]  Sergey Levine,et al.  Near-Optimal Representation Learning for Hierarchical Reinforcement Learning , 2018, ICLR.

[32]  Philippe Martinet,et al.  Indoor Navigation of a Wheeled Mobile Robot along Visual Routes , 2005, Proceedings of the 2005 IEEE International Conference on Robotics and Automation.

[33]  Jonathan W. Hurst,et al.  Iterative Reinforcement Learning Based Design of Dynamic Locomotion Skills for Cassie , 2019, ArXiv.

[34]  Wenbo Gao,et al.  Reinforcement Learning with Chromatic Networks for Compact Architecture Search , 2019 .

[35]  Atil Iscen,et al.  Sim-to-Real: Learning Agile Locomotion For Quadruped Robots , 2018, Robotics: Science and Systems.

[36]  Darwin G. Caldwell,et al.  Fast and Continuous Foothold Adaptation for Dynamic Locomotion Through CNNs , 2018, IEEE Robotics and Automation Letters.

[37]  Nicolas Heess,et al.  Hierarchical visuomotor control of humanoids , 2018, ICLR.

[38]  Silvio Savarese,et al.  HRL4IN: Hierarchical Reinforcement Learning for Interactive Navigation with Mobile Manipulators , 2019, CoRL.

[39]  Yuval Tassa,et al.  Learning and Transfer of Modulated Locomotor Controllers , 2016, ArXiv.

[40]  Karol Hausman,et al.  Learning an Embedding Space for Transferable Robot Skills , 2018, ICLR.

[41]  Glen Berseth,et al.  Terrain-adaptive locomotion skills using deep reinforcement learning , 2016, ACM Trans. Graph..