Sim-to-Real Transfer for Quadrupedal Locomotion via Terrain Transformer

Deep reinforcement learning has recently emerged as an appealing alternative for legged locomotion over multiple terrains by training a policy in physical simulation and then transferring it to the real world (i.e., sim-to-real transfer). Despite considerable progress, the capacity and scalability of traditional neural networks are still limited, which may hinder their applications in more complex environments. In contrast, the Transformer architecture has shown its superiority in a wide range of large-scale sequence modeling tasks, including natural language processing and decision-making problems. In this paper, we propose Terrain Transformer (TERT), a high-capacity Transformer model for quadrupedal locomotion control on various terrains. Furthermore, to better leverage Transformer in sim-to-real scenarios, we present a novel two-stage training framework consisting of an offline pretraining stage and an online correction stage, which can naturally integrate Transformer with privileged training. Extensive experiments in simulation demonstrate that TERT outperforms state-of-the-art baselines on different terrains in terms of return, energy consumption and control smoothness. In further real-world validation, TERT successfully traverses nine challenging terrains, including sand pit and stair down, which can not be accomplished by strong baselines.

[1]  L. Melo Transformers are Meta-Reinforcement Learners , 2022, ICML.

[2]  Yaodong Yang,et al.  Multi-Agent Reinforcement Learning is a Sequence Modeling Problem , 2022, NeurIPS.

[3]  Sergio Gomez Colmenarejo,et al.  A Generalist Agent , 2022, Trans. Mach. Learn. Res..

[4]  Lorenz Wellhausen,et al.  Learning robust perceptive locomotion for quadrupedal robots in the wild , 2022, Science Robotics.

[5]  Ross B. Girshick,et al.  Masked Autoencoders Are Scalable Vision Learners , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Max Q.-H. Meng,et al.  Reinforcement Learning with Evolutionary Trajectory Generator: A General Approach for Quadrupedal Locomotion , 2021, IEEE Robotics and Automation Letters.

[7]  Xiaolong Wang,et al.  Learning Vision-Guided Quadrupedal Locomotion End-to-End with Cross-Modal Transformers , 2021, ICLR.

[8]  Philipp Reist,et al.  Learning to Walk in Minutes Using Massively Parallel Deep Reinforcement Learning , 2021, CoRL.

[9]  Miles Macklin,et al.  Isaac Gym: High Performance GPU-Based Physics Simulation For Robot Learning , 2021, NeurIPS Datasets and Benchmarks.

[10]  Jitendra Malik,et al.  RMA: Rapid Motor Adaptation for Legged Robots , 2021, Robotics: Science and Systems.

[11]  Sergey Levine,et al.  Offline Reinforcement Learning as One Big Sequence Modeling Problem , 2021, NeurIPS.

[12]  Pieter Abbeel,et al.  Decision Transformer: Reinforcement Learning via Sequence Modeling , 2021, NeurIPS.

[13]  Byron Boots,et al.  Fast and Efficient Locomotion via Learned Gait Transitions , 2021, CoRL.

[14]  Alec Radford,et al.  Zero-Shot Text-to-Image Generation , 2021, ICML.

[15]  Kevin Lin,et al.  End-to-End Human Pose and Mesh Reconstruction with Transformers , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Xingye Da,et al.  Dynamics Randomization Revisited: A Case Study for Quadrupedal Locomotion , 2020, 2021 IEEE International Conference on Robotics and Automation (ICRA).

[17]  S. Gelly,et al.  An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2020, ICLR.

[18]  Lorenz Wellhausen,et al.  Learning quadrupedal locomotion over challenging terrain , 2020, Science Robotics.

[19]  Tomi Westerlund,et al.  Sim-to-Real Transfer in Deep Reinforcement Learning for Robotics: a Survey , 2020, 2020 IEEE Symposium Series on Computational Intelligence (SSCI).

[20]  Byron Boots,et al.  Learning a Contact-Adaptive Controller for Robust, Efficient Legged Locomotion , 2020, CoRL.

[21]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[22]  S. Levine,et al.  Learning Agile Robotic Locomotion Skills by Imitating Animals , 2020, Robotics: Science and Systems.

[23]  Chelsea Finn,et al.  Rapidly Adaptable Legged Robots via Evolutionary Meta-Learning , 2020, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[24]  Sehoon Ha,et al.  Learning Fast Adaptation With Meta Strategy Optimization , 2019, IEEE Robotics and Automation Letters.

[25]  C. Karen Liu,et al.  Sim-to-Real Transfer for Biped Locomotion , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[26]  Joonho Lee,et al.  Learning agile and dynamic motor skills for legged robots , 2019, Science Robotics.

[27]  Yevgen Chebotar,et al.  Closing the Sim-to-Real Loop: Adapting Simulation Randomization with Real World Experience , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[28]  Sergey Levine,et al.  Learning to Adapt in Dynamic, Real-World Environments through Meta-Reinforcement Learning , 2018, ICLR.

[29]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[30]  Atil Iscen,et al.  Sim-to-Real: Learning Agile Locomotion For Quadruped Robots , 2018, Robotics: Science and Systems.

[31]  Vladlen Koltun,et al.  An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling , 2018, ArXiv.

[32]  Marcin Andrychowicz,et al.  Sim-to-Real Transfer of Robotic Control with Dynamics Randomization , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[33]  Alec Radford,et al.  Improving Language Understanding by Generative Pre-Training , 2018 .

[34]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[35]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[36]  Greg Turk,et al.  Preparing for the Unknown: Learning a Universal Policy with Online System Identification , 2017, Robotics: Science and Systems.

[37]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[38]  Thomas Bräunl,et al.  Leveraging multiple simulators for crossing the reality gap , 2012, 2012 12th International Conference on Control Automation Robotics & Vision (ICARCV).

[39]  Geoffrey J. Gordon,et al.  A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[40]  Stéphane Doncieux,et al.  Crossing the reality gap in evolutionary robotics by promoting transferable controllers , 2010, GECCO '10.

[41]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.