论文信息 - Rapidly Adaptable Legged Robots via Evolutionary Meta-Learning

Rapidly Adaptable Legged Robots via Evolutionary Meta-Learning

Learning adaptable policies is crucial for robots to operate autonomously in our complex and quickly changing world. In this work, we present a new meta-learning method that allows robots to quickly adapt to changes in dynamics. In contrast to gradient-based meta-learning algorithms that rely on second-order gradient estimation, we introduce a more noise-tolerant Batch Hill-Climbing adaptation operator and combine it with meta-learning based on evolutionary strategies. Our method significantly improves adaptation to changes in dynamics in high noise settings, which are common in robotics applications. We validate our approach on a quadruped robot that learns to walk while subject to changes in dynamics. We observe that our method significantly outperforms prior gradient-based approaches, enabling the robot to adapt its policy to changes based on less than 3 minutes of real data.

[1] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[2] Peter Stone,et al. Policy gradient reinforcement learning for fast quadrupedal locomotion , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.

[3] Peter Stone,et al. Machine Learning for Fast Quadrupedal Locomotion , 2004, AAAI.

[4] H. Sebastian Seung,et al. Learning to Walk in 20 Minutes , 2005 .

[5] Guy Lever,et al. Deterministic Policy Gradient Algorithms , 2014, ICML.

[6] Antoine Cully,et al. Robots that can adapt like animals , 2014, Nature.

[7] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.

[8] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.

[9] Wojciech Zaremba,et al. Transfer from Simulation to Real World through Learning Deep Inverse Dynamics Model , 2016, ArXiv.

[10] Peter L. Bartlett,et al. RL$^2$: Fast Reinforcement Learning via Slow Reinforcement Learning , 2016, ArXiv.

[11] Daniel E. Koditschek,et al. Design Principles for a Family of Direct-Drive Legged Robots , 2016, IEEE Robotics and Automation Letters.

[12] Zeb Kurth-Nelson,et al. Learning to reinforcement learn , 2016, CogSci.

[13] Xi Chen,et al. Evolution Strategies as a Scalable Alternative to Reinforcement Learning , 2017, ArXiv.

[14] Greg Turk,et al. Preparing for the Unknown: Learning a Universal Policy with Online System Identification , 2017, Robotics: Science and Systems.

[15] Sergey Levine,et al. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[16] Sham M. Kakade,et al. Towards Generalization and Simplicity in Continuous Control , 2017, NIPS.

[17] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.

[18] Sergey Levine,et al. (CAD)$^2$RL: Real Single-Image Flight without a Single Real Image , 2016, Robotics: Science and Systems.

[19] Marcin Andrychowicz,et al. Sim-to-Real Transfer of Robotic Control with Dynamics Randomization , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[20] Tom Schaul,et al. Meta-learning by the Baldwin effect , 2018, GECCO.

[21] Sergey Levine,et al. Meta-Learning and Universality: Deep Representations and Gradient Descent can Approximate any Learning Algorithm , 2017, ICLR.

[22] Atil Iscen,et al. Sim-to-Real: Learning Agile Locomotion For Quadruped Robots , 2018, Robotics: Science and Systems.

[23] Sergey Levine,et al. Meta-Reinforcement Learning of Structured Exploration Strategies , 2018, NeurIPS.

[24] Sehoon Ha,et al. Automated Deep Reinforcement Learning Environment for Hardware of a Modular Legged Robot , 2018, 2018 15th International Conference on Ubiquitous Robots (UR).

[25] Pieter Abbeel,et al. A Simple Neural Attentive Meta-Learner , 2017, ICLR.

[26] Benjamin Recht,et al. Simple random search provides a competitive approach to reinforcement learning , 2018, ArXiv.

[27] Pieter Abbeel,et al. Evolved Policy Gradients , 2018, NeurIPS.

[28] Atil Iscen,et al. When random search is not enough: Sample-Efficient and Noise-Robust Blackbox Optimization of RL Policies , 2019, ArXiv.

[29] C. Karen Liu,et al. Policy Transfer with Strategy Optimization , 2018, ICLR.

[30] Sergey Levine,et al. Learning to Walk via Deep Reinforcement Learning , 2018, Robotics: Science and Systems.

[31] Sergey Levine,et al. Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Variables , 2019, ICML.

[32] Michiel van de Panne,et al. Learning Locomotion Skills for Cassie: Iterative Design and Sim-to-Real , 2019, CoRL.

[33] Sergey Levine,et al. Learning to Adapt in Dynamic, Real-World Environments through Meta-Reinforcement Learning , 2018, ICLR.

[34] Vikash Kumar,et al. Multi-Agent Manipulation via Locomotion using Hierarchical Sim2Real , 2019, CoRL.

[35] Christopher Joseph Pal,et al. Active Domain Randomization , 2019, CoRL.

[36] Atil Iscen,et al. NoRML: No-Reward Meta Learning , 2019, AAMAS.

[37] Joonho Lee,et al. Learning agile and dynamic motor skills for legged robots , 2019, Science Robotics.

[38] Tamim Asfour,et al. ProMP: Proximal Meta-Policy Search , 2018, ICLR.

[39] Christopher G. Atkeson,et al. Using Deep Reinforcement Learning to Learn High-Level Policies on the ATRIAS Biped , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[40] Richard Socher,et al. Taming MAML: Efficient unbiased meta-reinforcement learning , 2019, ICML.

[41] Bharadwaj S. Amrutur,et al. Realizing Learned Quadruped Locomotion Behaviors through Kinematic Motion Primitives , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[42] Sehoon Ha,et al. Learning Fast Adaptation With Meta Strategy Optimization , 2019, IEEE Robotics and Automation Letters.

[43] Ville Kyrki,et al. Meta Reinforcement Learning for Sim-to-real Domain Adaptation , 2019, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[44] Luisa M. Zintgraf,et al. VariBAD: A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning , 2019, ICLR.

[45] Akshara Rai,et al. Learning Generalizable Locomotion Skills with Hierarchical Reinforcement Learning , 2019, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[46] K. Choromanski,et al. ES-MAML: Simple Hessian-Free Meta Learning , 2019, ICLR.

[47] D. Golovin,et al. Gradientless Descent: High-Dimensional Zeroth-Order Optimization , 2019, ICLR.

[48] Peter Stone,et al. Stochastic Grounded Action Transformation for Robot Learning in Simulation , 2017, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).