RMA: Rapid Motor Adaptation for Legged Robots

Successful real-world deployment of legged robots would require them to adapt in real-time to unseen scenarios like changing terrains, changing payloads, wear and tear. This paper presents Rapid Motor Adaptation (RMA) algorithm to solve this problem of real-time online adaptation in quadruped robots. RMA consists of two components: a base policy and an adaptation module. The combination of these components enables the robot to adapt to novel situations in fractions of a second. RMA is trained completely in simulation without using any domain knowledge like reference trajectories or predefined foot trajectory generators and is deployed on the A1 robot without any fine-tuning. We train RMA on a varied terrain generator using bioenergetics-inspired rewards and deploy it on a variety of difficult terrains including rocky, slippery, deformable surfaces in environments with grass, long vegetation, concrete, pebbles, stairs, sand, etc. RMA shows state-of-the-art performance across diverse real-world as well as simulation experiments. Video results at https://ashish-kmr.github.io/rma-legged-robots/.

[1]  Dong Jin Hyun,et al.  Implementation of trot-to-gallop transition and subsequent gallop on the MIT Cheetah I , 2016, Int. J. Robotics Res..

[2]  Sehoon Ha,et al.  Learning Fast Adaptation With Meta Strategy Optimization , 2020, IEEE Robotics and Automation Letters.

[3]  Abhinav Gupta,et al.  Environment Probing Interaction Policies , 2019, ICLR.

[4]  Taylor Apgar,et al.  Fast Online Trajectory Optimization for the Bipedal Robot Cassie , 2018, Robotics: Science and Systems.

[5]  Koushil Sreenath,et al.  Rapidly Exponentially Stabilizing Control Lyapunov Functions and Hybrid Zero Dynamics , 2014, IEEE Transactions on Automatic Control.

[6]  Geoffrey J. Gordon,et al.  A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[7]  Xingye Da,et al.  Dynamics Randomization Revisited: A Case Study for Quadrupedal Locomotion , 2020, 2021 IEEE International Conference on Robotics and Automation (ICRA).

[8]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[9]  Marco Hutter,et al.  Per-Contact Iteration Method for Solving Contact Dynamics , 2018, IEEE Robotics and Automation Letters.

[10]  Kris K. Hauser,et al.  Robust trajectory optimization under frictional contact with iterative learning , 2015, Auton. Robots.

[11]  Roland Siegwart,et al.  Practice Makes Perfect: An Optimization-Based Approach to Controlling Agile Motions for a Quadruped Robot , 2016, IEEE Robotics & Automation Magazine.

[12]  M. Y. M. Ahmed,et al.  Surrogate-Based Aerodynamic Design Optimization: Use of Surrogates in Aerodynamic Design Optimization , 2009 .

[13]  Atil Iscen,et al.  Policies Modulating Trajectory Generators , 2018, CoRL.

[14]  Marc H. Raibert,et al.  Hopping in legged systems — Modeling and simulation for the two-dimensional one-legged case , 1984, IEEE Transactions on Systems, Man, and Cybernetics.

[15]  Sangbae Kim,et al.  Dynamic Locomotion in the MIT Cheetah 3 Through Convex Model-Predictive Control , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[16]  Daniel E. Koditschek,et al.  RHex: A Simple and Highly Mobile Hexapod Robot , 2001, Int. J. Robotics Res..

[17]  Peter Fankhauser,et al.  ANYmal - a highly mobile and dynamic quadrupedal robot , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[18]  Christopher G. Atkeson,et al.  Optimization and learning for rough terrain legged locomotion , 2011, Int. J. Robotics Res..

[19]  I. Shimoyama,et al.  Dynamic Walk of a Biped , 1984 .

[20]  Lorenz Wellhausen,et al.  Learning quadrupedal locomotion over challenging terrain , 2020, Science Robotics.

[21]  Reinhard Blickhan,et al.  Positive force feedback in bouncing gaits? , 2003, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[22]  Jan Peters,et al.  Bayesian optimization for learning gaits under uncertainty , 2015, Annals of Mathematics and Artificial Intelligence.

[23]  Sangbae Kim,et al.  MIT Cheetah 3: Design and Control of a Robust, Dynamic Quadruped Robot , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[24]  Stéphane Doncieux,et al.  Crossing the reality gap in evolutionary robotics by promoting transferable controllers , 2010, GECCO '10.

[25]  Mary M. Hayhoe,et al.  Gaze and the Control of Foot Placement When Walking in Natural Terrain , 2018, Current Biology.

[26]  Atil Iscen,et al.  Optimizing Simulations with Noise-Tolerant Structured Exploration , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[27]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[28]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[29]  C. Karen Liu,et al.  Sim-to-Real Transfer for Biped Locomotion , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[30]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[31]  Joonho Lee,et al.  Learning agile and dynamic motor skills for legged robots , 2019, Science Robotics.

[32]  Stefan Schaal,et al.  Fast, robust quadruped locomotion over challenging terrain , 2010, 2010 IEEE International Conference on Robotics and Automation.

[33]  Herke van Hoof,et al.  Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.

[34]  Hod Lipson,et al.  Nonlinear system identification using coevolution of models and tests , 2005, IEEE Transactions on Evolutionary Computation.

[35]  Marcin Andrychowicz,et al.  Sim-to-Real Transfer of Robotic Control with Dynamics Randomization , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[36]  Christopher G. Atkeson,et al.  An optimization approach to rough terrain locomotion , 2010, 2010 IEEE International Conference on Robotics and Automation.

[37]  Sergey Levine,et al.  High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.

[38]  Jonas Buchli,et al.  Why off-the-shelf physics simulators fail in evaluating feedback controller performance - a case study for quadrupedal robots , 2016 .

[39]  Peter Stone,et al.  Stochastic Grounded Action Transformation for Robot Learning in Simulation , 2017, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[40]  Koushil Sreenath,et al.  A Compliant Hybrid Zero Dynamics Controller for Stable, Efficient and Fast Bipedal Walking on MABEL , 2011, Int. J. Robotics Res..

[41]  Atil Iscen,et al.  Sim-to-Real: Learning Agile Locomotion For Quadruped Robots , 2018, Robotics: Science and Systems.

[42]  Vikash Kumar,et al.  Multi-Agent Manipulation via Locomotion using Hierarchical Sim2Real , 2019, CoRL.

[43]  Wei Li,et al.  Convolutional Neural Networks for Steady Flow Approximation , 2016, KDD.

[44]  KangKang Yin,et al.  SIMBICON: simple biped locomotion control , 2007, ACM Trans. Graph..

[45]  Andrew Y. Ng,et al.  The Stanford LittleDog: A learning and rapid replanning approach to quadruped locomotion , 2011, Int. J. Robotics Res..

[46]  Majid Nili Ahmadabadi,et al.  Piecewise linear spine for speed-energy efficiency trade-off in quadruped robots , 2013, Robotics Auton. Syst..

[47]  Masayoshi Tomizuka,et al.  Tail Assisted Dynamic Self Righting , 2012 .

[48]  Greg Turk,et al.  Preparing for the Unknown: Learning a Universal Policy with Online System Identification , 2017, Robotics: Science and Systems.

[49]  Jie Tan,et al.  Learning Agile Robotic Locomotion Skills by Imitating Animals , 2020, RSS 2020.

[50]  Jan Peters,et al.  Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..

[51]  Thomas Bräunl,et al.  Leveraging multiple simulators for crossing the reality gap , 2012, 2012 12th International Conference on Control Automation Robotics & Vision (ICARCV).

[52]  C. Karen Liu,et al.  Policy Transfer with Strategy Optimization , 2018, ICLR.

[53]  Sergey Levine,et al.  Learning to Walk via Deep Reinforcement Learning , 2018, Robotics: Science and Systems.

[54]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[55]  Martin de Lasa,et al.  Feature-based locomotion controllers , 2010, ACM Trans. Graph..

[56]  Siddhartha S. Srinivasa,et al.  CHOMP: Gradient optimization techniques for efficient motion planning , 2009, 2009 IEEE International Conference on Robotics and Automation.

[57]  Wojciech Zaremba,et al.  Domain randomization for transferring deep neural networks from simulation to the real world , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[58]  Sergey Levine,et al.  Learning to Adapt in Dynamic, Real-World Environments through Meta-Reinforcement Learning , 2018, ICLR.

[59]  Atil Iscen,et al.  Data Efficient Reinforcement Learning for Legged Robots , 2019, CoRL.

[60]  John E A Bertram,et al.  An inelastic quadrupedal model discovers four-beat walking, two-beat running, and pseudo-elastic actuation as energetically optimal , 2019, PLoS Comput. Biol..

[61]  Chelsea Finn,et al.  Rapidly Adaptable Legged Robots via Evolutionary Meta-Learning , 2020, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).