Learning Agile Locomotion Skills with a Mentor

Developing agile behaviors for legged robots remains a challenging problem. While deep reinforcement learning is a promising approach, learning truly agile behaviors typically requires tedious reward shaping and careful curriculum design. We formulate agile locomotion as a multi-stage learning problem in which a mentor guides the agent throughout the training. The mentor is optimized to place a checkpoint to guide the movement of the robot's center of mass while the student (i.e. the robot) learns to reach these checkpoints. Once the student can solve the task, we teach the student to perform the task without the mentor. We evaluate our proposed learning system with a simulated quadruped robot on a course consisting of randomly generated gaps and hurdles. Our method significantly outperforms a single-stage RL baseline without a mentor, and the quadruped robot can agilely run and jump across gaps and obstacles. Finally, we present a detailed analysis of the learned behaviors' feasibility and efficiency.

[1]  Hartmut Witte,et al.  Comparing the effect of different spine and leg designs for a small bounding quadruped robot , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[2]  Glen Berseth,et al.  Terrain-adaptive locomotion skills using deep reinforcement learning , 2016, ACM Trans. Graph..

[3]  Majid Nili Ahmadabadi,et al.  Piecewise linear spine for speed-energy efficiency trade-off in quadruped robots , 2013, Robotics Auton. Syst..

[4]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[5]  M. Braae,et al.  Rapid acceleration and braking: Inspirations from the cheetah's tail , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[6]  Joonho Lee,et al.  Learning agile and dynamic motor skills for legged robots , 2019, Science Robotics.

[7]  Tatsuya Harada,et al.  Learning Agile Locomotion via Adversarial Training , 2020, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[8]  Atil Iscen,et al.  Data Efficient Reinforcement Learning for Legged Robots , 2019, CoRL.

[9]  Jonathan W. Hurst,et al.  Iterative Reinforcement Learning Based Design of Dynamic Locomotion Skills for Cassie , 2019, ArXiv.

[10]  Jongwoo Lee,et al.  Tails in biomimetic design: Analysis, simulation, and experiment , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[11]  Dong Jin Hyun,et al.  Implementation of trot-to-gallop transition and subsequent gallop on the MIT Cheetah I , 2016, Int. J. Robotics Res..

[12]  Vladlen Koltun,et al.  Learning by Cheating , 2019, CoRL.

[13]  Sergey Levine,et al.  DeepMimic , 2018, ACM Trans. Graph..

[14]  Marc H. Raibert,et al.  Legged Robots That Balance , 1986, IEEE Expert.

[15]  Sangbae Kim,et al.  Online Planning for Autonomous Running Jumps Over Obstacles in High-Speed Quadrupeds , 2015, Robotics: Science and Systems.

[16]  Michiel van de Panne,et al.  ALLSTEPS: Curriculum‐driven Learning of Stepping Stone Skills , 2020, Comput. Graph. Forum.

[17]  Glen Berseth,et al.  Dynamic terrain traversal skills using reinforcement learning , 2015, ACM Trans. Graph..

[18]  Atil Iscen,et al.  Policies Modulating Trajectory Generators , 2018, CoRL.

[19]  Sangbae Kim,et al.  Mini Cheetah: A Platform for Pushing the Limits of Dynamic Quadruped Control , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[20]  Quan Nguyen,et al.  Optimized Jumping on the MIT Cheetah 3 Robot , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[21]  Atil Iscen,et al.  Sim-to-Real: Learning Agile Locomotion For Quadruped Robots , 2018, Robotics: Science and Systems.

[22]  Sergey Levine,et al.  Learning to Walk via Deep Reinforcement Learning , 2018, Robotics: Science and Systems.

[23]  Auke Jan Ijspeert,et al.  Towards dynamic trot gait locomotion: Design, control, and experiments with Cheetah-cub, a compliant quadruped robot , 2013, Int. J. Robotics Res..

[24]  Jie Tan,et al.  Learning Agile Robotic Locomotion Skills by Imitating Animals , 2020, RSS 2020.

[25]  Sangbae Kim,et al.  High-speed bounding with the MIT Cheetah 2: Control design and experiments , 2017, Int. J. Robotics Res..

[26]  Gerardo Bledt,et al.  Extracting Legged Locomotion Heuristics with Regularized Predictive Control , 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[27]  Lorenz Wellhausen,et al.  Learning quadrupedal locomotion over challenging terrain , 2020, Science Robotics.

[28]  Yuval Tassa,et al.  Emergence of Locomotion Behaviours in Rich Environments , 2017, ArXiv.

[29]  Benjamin Recht,et al.  Simple random search provides a competitive approach to reinforcement learning , 2018, ArXiv.