Rapidly Adaptable Legged Robots via Evolutionary Meta-Learning

Learning adaptable policies is crucial for robots to operate autonomously in our complex and quickly changing world. In this work, we present a new meta-learning method that allows robots to quickly adapt to changes in dynamics. In contrast to gradient-based meta-learning algorithms that rely on second-order gradient estimation, we introduce a more noise-tolerant Batch Hill-Climbing adaptation operator and combine it with meta-learning based on evolutionary strategies. Our method significantly improves adaptation to changes in dynamics in high noise settings, which are common in robotics applications. We validate our approach on a quadruped robot that learns to walk while subject to changes in dynamics. We observe that our method significantly outperforms prior gradient-based approaches, enabling the robot to adapt its policy to changes based on less than 3 minutes of real data.

[1]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[2]  Peter Stone,et al.  Policy gradient reinforcement learning for fast quadrupedal locomotion , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.

[3]  Peter Stone,et al.  Machine Learning for Fast Quadrupedal Locomotion , 2004, AAAI.

[4]  H. Sebastian Seung,et al.  Learning to Walk in 20 Minutes , 2005 .

[5]  Guy Lever,et al.  Deterministic Policy Gradient Algorithms , 2014, ICML.

[6]  Antoine Cully,et al.  Robots that can adapt like animals , 2014, Nature.

[7]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[8]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[9]  Wojciech Zaremba,et al.  Transfer from Simulation to Real World through Learning Deep Inverse Dynamics Model , 2016, ArXiv.

[10]  Peter L. Bartlett,et al.  RL$^2$: Fast Reinforcement Learning via Slow Reinforcement Learning , 2016, ArXiv.

[11]  Daniel E. Koditschek,et al.  Design Principles for a Family of Direct-Drive Legged Robots , 2016, IEEE Robotics and Automation Letters.

[12]  Zeb Kurth-Nelson,et al.  Learning to reinforcement learn , 2016, CogSci.

[13]  Xi Chen,et al.  Evolution Strategies as a Scalable Alternative to Reinforcement Learning , 2017, ArXiv.

[14]  Greg Turk,et al.  Preparing for the Unknown: Learning a Universal Policy with Online System Identification , 2017, Robotics: Science and Systems.

[15]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[16]  Sham M. Kakade,et al.  Towards Generalization and Simplicity in Continuous Control , 2017, NIPS.

[17]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[18]  Sergey Levine,et al.  (CAD)$^2$RL: Real Single-Image Flight without a Single Real Image , 2016, Robotics: Science and Systems.

[19]  Marcin Andrychowicz,et al.  Sim-to-Real Transfer of Robotic Control with Dynamics Randomization , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[20]  Tom Schaul,et al.  Meta-learning by the Baldwin effect , 2018, GECCO.

[21]  Sergey Levine,et al.  Meta-Learning and Universality: Deep Representations and Gradient Descent can Approximate any Learning Algorithm , 2017, ICLR.

[22]  Atil Iscen,et al.  Sim-to-Real: Learning Agile Locomotion For Quadruped Robots , 2018, Robotics: Science and Systems.

[23]  Sergey Levine,et al.  Meta-Reinforcement Learning of Structured Exploration Strategies , 2018, NeurIPS.

[24]  Sehoon Ha,et al.  Automated Deep Reinforcement Learning Environment for Hardware of a Modular Legged Robot , 2018, 2018 15th International Conference on Ubiquitous Robots (UR).

[25]  Pieter Abbeel,et al.  A Simple Neural Attentive Meta-Learner , 2017, ICLR.

[26]  Benjamin Recht,et al.  Simple random search provides a competitive approach to reinforcement learning , 2018, ArXiv.

[27]  Pieter Abbeel,et al.  Evolved Policy Gradients , 2018, NeurIPS.

[28]  Atil Iscen,et al.  When random search is not enough: Sample-Efficient and Noise-Robust Blackbox Optimization of RL Policies , 2019, ArXiv.

[29]  C. Karen Liu,et al.  Policy Transfer with Strategy Optimization , 2018, ICLR.

[30]  Sergey Levine,et al.  Learning to Walk via Deep Reinforcement Learning , 2018, Robotics: Science and Systems.

[31]  Sergey Levine,et al.  Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Variables , 2019, ICML.

[32]  Michiel van de Panne,et al.  Learning Locomotion Skills for Cassie: Iterative Design and Sim-to-Real , 2019, CoRL.

[33]  Sergey Levine,et al.  Learning to Adapt in Dynamic, Real-World Environments through Meta-Reinforcement Learning , 2018, ICLR.

[34]  Vikash Kumar,et al.  Multi-Agent Manipulation via Locomotion using Hierarchical Sim2Real , 2019, CoRL.

[35]  Christopher Joseph Pal,et al.  Active Domain Randomization , 2019, CoRL.

[36]  Atil Iscen,et al.  NoRML: No-Reward Meta Learning , 2019, AAMAS.

[37]  Joonho Lee,et al.  Learning agile and dynamic motor skills for legged robots , 2019, Science Robotics.

[38]  Tamim Asfour,et al.  ProMP: Proximal Meta-Policy Search , 2018, ICLR.

[39]  Christopher G. Atkeson,et al.  Using Deep Reinforcement Learning to Learn High-Level Policies on the ATRIAS Biped , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[40]  Richard Socher,et al.  Taming MAML: Efficient unbiased meta-reinforcement learning , 2019, ICML.

[41]  Bharadwaj S. Amrutur,et al.  Realizing Learned Quadruped Locomotion Behaviors through Kinematic Motion Primitives , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[42]  Sehoon Ha,et al.  Learning Fast Adaptation With Meta Strategy Optimization , 2019, IEEE Robotics and Automation Letters.

[43]  Ville Kyrki,et al.  Meta Reinforcement Learning for Sim-to-real Domain Adaptation , 2019, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[44]  Luisa M. Zintgraf,et al.  VariBAD: A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning , 2019, ICLR.

[45]  Akshara Rai,et al.  Learning Generalizable Locomotion Skills with Hierarchical Reinforcement Learning , 2019, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[46]  K. Choromanski,et al.  ES-MAML: Simple Hessian-Free Meta Learning , 2019, ICLR.

[47]  D. Golovin,et al.  Gradientless Descent: High-Dimensional Zeroth-Order Optimization , 2019, ICLR.

[48]  Peter Stone,et al.  Stochastic Grounded Action Transformation for Robot Learning in Simulation , 2017, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).