Robust Recovery Controller for a Quadrupedal Robot using Deep Reinforcement Learning

The ability to recover from a fall is an essential feature for a legged robot to navigate in challenging environments robustly. Until today, there has been very little progress on this topic. Current solutions mostly build upon (heuristically) predefined trajectories, resulting in unnatural behaviors and requiring considerable effort in engineering system-specific components. In this paper, we present an approach based on model-free Deep Reinforcement Learning (RL) to control recovery maneuvers of quadrupedal robots using a hierarchical behavior-based controller. The controller consists of four neural network policies including three behaviors and one behavior selector to coordinate them. Each of them is trained individually in simulation and deployed directly on a real system. We experimentally validate our approach on the quadrupedal robot ANYmal, which is a dog-sized quadrupedal system with 12 degrees of freedom. With our method, ANYmal manifests dynamic and reactive recovery behaviors to recover from an arbitrary fall configuration within less than 5 seconds. We tested the recovery maneuver more than 100 times, and the success rate was higher than 97 %.

[1]  Hannes Sommer,et al.  The Two-State Implicit Filter Recursive Estimation for Mobile Robots , 2018, IEEE Robotics and Automation Letters.

[2]  Marco Hutter,et al.  Dynamic Locomotion Through Online Nonlinear Motion Optimization for Quadrupedal Robots , 2018, IEEE Robotics and Automation Letters.

[3]  Peter Fankhauser,et al.  ANYmal - a highly mobile and dynamic quadrupedal robot , 2016, IROS 2016.

[4]  Tianmiao Wang,et al.  A hopping-righting mechanism analysis and design of the mobile robot , 2013 .

[5]  Roland Siegwart,et al.  State Estimation for Legged Robots , 2012, RSS 2012.

[6]  Joonho Lee,et al.  Learning agile and dynamic motor skills for legged robots , 2019, Science Robotics.

[7]  Roland Siegwart,et al.  State Estimation for Legged Robots - Consistent Fusion of Leg Kinematics and IMU , 2012, Robotics: Science and Systems.

[8]  Michiel van de Panne,et al.  Learning locomotion skills using DeepRL: does the choice of action space matter? , 2016, Symposium on Computer Animation.

[9]  Pieter Abbeel,et al.  Meta Learning Shared Hierarchies , 2017, ICLR.

[10]  Zoran Popovic,et al.  Discovery of complex behaviors through contact-invariant optimization , 2012, ACM Trans. Graph..

[11]  Daniel E. Koditschek,et al.  Model-Based Dynamic Self-Righting Maneuvers for a Hexapedal Robot , 2004, Int. J. Robotics Res..

[12]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[13]  Atil Iscen,et al.  Sim-to-Real: Learning Agile Locomotion For Quadruped Robots , 2018, Robotics: Science and Systems.

[14]  Rodney A. Brooks,et al.  Learning to Coordinate Behaviors , 1990, AAAI.

[15]  Marco Hutter,et al.  Per-Contact Iteration Method for Solving Contact Dynamics , 2018, IEEE Robotics and Automation Letters.

[16]  Matthew M. Williamson,et al.  Series elastic actuators , 1995, Proceedings 1995 IEEE/RSJ International Conference on Intelligent Robots and Systems. Human Robot Interaction and Cooperative Robots.

[17]  Nicolas Heess,et al.  Hierarchical visuomotor control of humanoids , 2018, ICLR.

[18]  Roland Siegwart,et al.  Control of a Quadrotor With Reinforcement Learning , 2017, IEEE Robotics and Automation Letters.

[19]  Roland Siegwart,et al.  Practice Makes Perfect: An Optimization-Based Approach to Controlling Agile Motions for a Quadruped Robot , 2016, IEEE Robotics & Automation Magazine.

[20]  Jörg Stückler,et al.  Getting Back on Two Feet: Reliable Standing-up Routines for a Humanoid Robot , 2006, IAS.

[21]  Rodney A. Brooks,et al.  A Robust Layered Control Syste For A Mobile Robot , 2022 .

[22]  Glen Berseth,et al.  Progressive Reinforcement Learning with Distillation for Multi-Skilled Motion Control , 2018, ICLR.

[23]  Glen Berseth,et al.  Terrain-adaptive locomotion skills using deep reinforcement learning , 2016, ACM Trans. Graph..

[24]  Sergey Levine,et al.  High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.