论文信息 - RMA: Rapid Motor Adaptation for Legged Robots

RMA: Rapid Motor Adaptation for Legged Robots

Successful real-world deployment of legged robots would require them to adapt in real-time to unseen scenarios like changing terrains, changing payloads, wear and tear. This paper presents Rapid Motor Adaptation (RMA) algorithm to solve this problem of real-time online adaptation in quadruped robots. RMA consists of two components: a base policy and an adaptation module. The combination of these components enables the robot to adapt to novel situations in fractions of a second. RMA is trained completely in simulation without using any domain knowledge like reference trajectories or predefined foot trajectory generators and is deployed on the A1 robot without any fine-tuning. We train RMA on a varied terrain generator using bioenergetics-inspired rewards and deploy it on a variety of difficult terrains including rocky, slippery, deformable surfaces in environments with grass, long vegetation, concrete, pebbles, stairs, sand, etc. RMA shows state-of-the-art performance across diverse real-world as well as simulation experiments. Video results at https://ashish-kmr.github.io/rma-legged-robots/.

[1] Dong Jin Hyun,et al. Implementation of trot-to-gallop transition and subsequent gallop on the MIT Cheetah I , 2016, Int. J. Robotics Res..

[2] Sehoon Ha,et al. Learning Fast Adaptation With Meta Strategy Optimization , 2020, IEEE Robotics and Automation Letters.

[3] Abhinav Gupta,et al. Environment Probing Interaction Policies , 2019, ICLR.

[4] Taylor Apgar,et al. Fast Online Trajectory Optimization for the Bipedal Robot Cassie , 2018, Robotics: Science and Systems.

[5] Koushil Sreenath,et al. Rapidly Exponentially Stabilizing Control Lyapunov Functions and Hybrid Zero Dynamics , 2014, IEEE Transactions on Automatic Control.

[6] Geoffrey J. Gordon,et al. A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[7] Xingye Da,et al. Dynamics Randomization Revisited: A Case Study for Quadrupedal Locomotion , 2020, 2021 IEEE International Conference on Robotics and Automation (ICRA).

[8] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[9] Marco Hutter,et al. Per-Contact Iteration Method for Solving Contact Dynamics , 2018, IEEE Robotics and Automation Letters.

[10] Kris K. Hauser,et al. Robust trajectory optimization under frictional contact with iterative learning , 2015, Auton. Robots.

[11] Roland Siegwart,et al. Practice Makes Perfect: An Optimization-Based Approach to Controlling Agile Motions for a Quadruped Robot , 2016, IEEE Robotics & Automation Magazine.

[12] M. Y. M. Ahmed,et al. Surrogate-Based Aerodynamic Design Optimization: Use of Surrogates in Aerodynamic Design Optimization , 2009 .

[13] Atil Iscen,et al. Policies Modulating Trajectory Generators , 2018, CoRL.

[14] Marc H. Raibert,et al. Hopping in legged systems — Modeling and simulation for the two-dimensional one-legged case , 1984, IEEE Transactions on Systems, Man, and Cybernetics.

[15] Sangbae Kim,et al. Dynamic Locomotion in the MIT Cheetah 3 Through Convex Model-Predictive Control , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[16] Daniel E. Koditschek,et al. RHex: A Simple and Highly Mobile Hexapod Robot , 2001, Int. J. Robotics Res..

[17] Peter Fankhauser,et al. ANYmal - a highly mobile and dynamic quadrupedal robot , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[18] Christopher G. Atkeson,et al. Optimization and learning for rough terrain legged locomotion , 2011, Int. J. Robotics Res..

[19] I. Shimoyama,et al. Dynamic Walk of a Biped , 1984 .

[20] Lorenz Wellhausen,et al. Learning quadrupedal locomotion over challenging terrain , 2020, Science Robotics.

[21] Reinhard Blickhan,et al. Positive force feedback in bouncing gaits? , 2003, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[22] Jan Peters,et al. Bayesian optimization for learning gaits under uncertainty , 2015, Annals of Mathematics and Artificial Intelligence.

[23] Sangbae Kim,et al. MIT Cheetah 3: Design and Control of a Robust, Dynamic Quadruped Robot , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[24] Stéphane Doncieux,et al. Crossing the reality gap in evolutionary robotics by promoting transferable controllers , 2010, GECCO '10.

[25] Mary M. Hayhoe,et al. Gaze and the Control of Foot Placement When Walking in Natural Terrain , 2018, Current Biology.

[26] Atil Iscen,et al. Optimizing Simulations with Noise-Tolerant Structured Exploration , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[27] Sergey Levine,et al. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[28] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.

[29] C. Karen Liu,et al. Sim-to-Real Transfer for Biped Locomotion , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[30] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.

[31] Joonho Lee,et al. Learning agile and dynamic motor skills for legged robots , 2019, Science Robotics.

[32] Stefan Schaal,et al. Fast, robust quadruped locomotion over challenging terrain , 2010, 2010 IEEE International Conference on Robotics and Automation.

[33] Herke van Hoof,et al. Addressing Function Approximation Error in Actor-Critic Methods , 2018, ICML.

[34] Hod Lipson,et al. Nonlinear system identification using coevolution of models and tests , 2005, IEEE Transactions on Evolutionary Computation.

[35] Marcin Andrychowicz,et al. Sim-to-Real Transfer of Robotic Control with Dynamics Randomization , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[36] Christopher G. Atkeson,et al. An optimization approach to rough terrain locomotion , 2010, 2010 IEEE International Conference on Robotics and Automation.

[37] Sergey Levine,et al. High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.

[38] Jonas Buchli,et al. Why off-the-shelf physics simulators fail in evaluating feedback controller performance - a case study for quadrupedal robots , 2016 .

[39] Peter Stone,et al. Stochastic Grounded Action Transformation for Robot Learning in Simulation , 2017, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[40] Koushil Sreenath,et al. A Compliant Hybrid Zero Dynamics Controller for Stable, Efficient and Fast Bipedal Walking on MABEL , 2011, Int. J. Robotics Res..

[41] Atil Iscen,et al. Sim-to-Real: Learning Agile Locomotion For Quadruped Robots , 2018, Robotics: Science and Systems.

[42] Vikash Kumar,et al. Multi-Agent Manipulation via Locomotion using Hierarchical Sim2Real , 2019, CoRL.

[43] Wei Li,et al. Convolutional Neural Networks for Steady Flow Approximation , 2016, KDD.

[44] KangKang Yin,et al. SIMBICON: simple biped locomotion control , 2007, ACM Trans. Graph..

[45] Andrew Y. Ng,et al. The Stanford LittleDog: A learning and rapid replanning approach to quadruped locomotion , 2011, Int. J. Robotics Res..

[46] Majid Nili Ahmadabadi,et al. Piecewise linear spine for speed-energy efficiency trade-off in quadruped robots , 2013, Robotics Auton. Syst..

[47] Masayoshi Tomizuka,et al. Tail Assisted Dynamic Self Righting , 2012 .

[48] Greg Turk,et al. Preparing for the Unknown: Learning a Universal Policy with Online System Identification , 2017, Robotics: Science and Systems.

[49] Jie Tan,et al. Learning Agile Robotic Locomotion Skills by Imitating Animals , 2020, RSS 2020.

[50] Jan Peters,et al. Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..

[51] Thomas Bräunl,et al. Leveraging multiple simulators for crossing the reality gap , 2012, 2012 12th International Conference on Control Automation Robotics & Vision (ICARCV).

[52] C. Karen Liu,et al. Policy Transfer with Strategy Optimization , 2018, ICLR.

[53] Sergey Levine,et al. Learning to Walk via Deep Reinforcement Learning , 2018, Robotics: Science and Systems.

[54] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[55] Martin de Lasa,et al. Feature-based locomotion controllers , 2010, ACM Trans. Graph..

[56] Siddhartha S. Srinivasa,et al. CHOMP: Gradient optimization techniques for efficient motion planning , 2009, 2009 IEEE International Conference on Robotics and Automation.

[57] Wojciech Zaremba,et al. Domain randomization for transferring deep neural networks from simulation to the real world , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[58] Sergey Levine,et al. Learning to Adapt in Dynamic, Real-World Environments through Meta-Reinforcement Learning , 2018, ICLR.

[59] Atil Iscen,et al. Data Efficient Reinforcement Learning for Legged Robots , 2019, CoRL.

[60] John E A Bertram,et al. An inelastic quadrupedal model discovers four-beat walking, two-beat running, and pseudo-elastic actuation as energetically optimal , 2019, PLoS Comput. Biol..

[61] Chelsea Finn,et al. Rapidly Adaptable Legged Robots via Evolutionary Meta-Learning , 2020, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).