Reinforcement Learning with Evolutionary Trajectory Generator: A General Approach for Quadrupedal Locomotion

Recently reinforcement learning (RL) has emerged as a promising approach for quadrupedal locomotion, which can save the manual effort in conventional approaches such as designing skill-specific controllers. However, due to the complex nonlinear dynamics in quadrupedal robots and reward sparsity, it is still difficult for RL to learn effective gaits from scratch, especially in challenging tasks such as walking over the balance beam. To alleviate such difficulty, we propose a novel RL-based approach that contains an evolutionary foot trajectory generator. Unlike prior methods that use a fixed trajectory generator, the generator continually optimizes the shape of the output trajectory for the given task, providing diversified motion priors to guide the policy learning. * Equal Contribution. † Corresponding author. 1 Haojie Shi is from the Chinese University of Hong Kong, Hong Kong, (email: h.shi@link.cuhk.edu.hk) 2 Bo Zhou, Hongsheng Zeng, Fan Wang, Yueqiang Dong, Jiangyong Li, Kang Wang and Hao Tian are affiliated with Baidu Inc., China. (email: zhoubo01, zenghongsheng, wang.fan, dongyueqiang, lijiangyong01, wangkang02, tianhao@baidu.com) 3 Max Q.-H. Meng is with the Department of Electronic and Electrical Engineering of the Southern University of Science and Technology in Shenzhen, China, on leave from the Department of Electronic Engineering, the Chinese University of Hong Kong, Hong Kong, and also with the Shenzhen Research Institute of the Chinese University of Hong Kong in Shenzhen, China. (email: max.meng@cuhk.edu.hk) The policy is trained with reinforcement learning to output residual control signals that fit different gaits. We then optimize the trajectory generator and policy network alternatively to stabilize the training and share the exploratory data to improve sample efficiency. As a result, our approach can solve a range of challenging tasks in simulation by learning from scratch, including walking on a balance beam and crawling through the cave. To further verify the effectiveness of our approach, we deploy the controller learned in the simulation on a 12-DoF quadrupedal robot, and it can successfully traverse challenging scenarios with efficient gaits.

[1]  H. Kurokawa,et al.  Automatic locomotion design and experiments for a Modular robotic system , 2005, IEEE/ASME Transactions on Mechatronics.

[2]  Jitendra Malik,et al.  RMA: Rapid Motor Adaptation for Legged Robots , 2021, Robotics: Science and Systems.

[3]  Marcin Andrychowicz,et al.  Sim-to-Real Transfer of Robotic Control with Dynamics Randomization , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[4]  Lorenz Wellhausen,et al.  Learning quadrupedal locomotion over challenging terrain , 2020, Science Robotics.

[5]  C. Pinto Stability of quadruped robots’ trajectories subjected to discrete perturbations , 2012 .

[6]  Sergey Levine,et al.  Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[7]  Auke Jan Ijspeert,et al.  Central pattern generators for locomotion control in animals and robots: A review , 2008, Neural Networks.

[8]  Atil Iscen,et al.  Sim-to-Real: Learning Agile Locomotion For Quadruped Robots , 2018, Robotics: Science and Systems.

[9]  Jie Tan,et al.  Learning Agile Robotic Locomotion Skills by Imitating Animals , 2020, RSS 2020.

[10]  Yuan Gao,et al.  Provably Stabilizing Controllers for Quadrupedal Robot Locomotion on Dynamic Rigid Platforms , 2020, IEEE/ASME Transactions on Mechatronics.

[11]  Guangming Xie,et al.  CPG-based Locomotion Controller Design for a Boxfish-like Robot , 2014 .

[12]  Darwin G. Caldwell,et al.  Line Walking and Balancing for Legged Robots with Point Feet , 2020, ArXiv.

[13]  Geoffrey J. Gordon,et al.  A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[14]  Dong Jin Hyun,et al.  High speed trot-running: Implementation of a hierarchical controller using proprioceptive impedance control on the MIT Cheetah , 2014, Int. J. Robotics Res..

[15]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[16]  Jun Morimoto,et al.  Learning CPG-based Biped Locomotion with a Policy Gradient Method: Application to a Humanoid Robot , 2005, 5th IEEE-RAS International Conference on Humanoid Robots, 2005..

[17]  Ashish Joglekar,et al.  Robust Quadrupedal Locomotion on Sloped Terrains: A Linear Policy Approach , 2020, CoRL.

[18]  Tomas Kulvicius,et al.  Generic Neural Locomotion Control Framework for Legged Robots , 2020, IEEE Transactions on Neural Networks and Learning Systems.

[19]  Danwei Wang,et al.  CPG-Inspired Workspace Trajectory Generation and Adaptive Locomotion Control for Quadruped Robots , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[20]  Qiuguo Zhu,et al.  Multi-expert learning of adaptive legged locomotion , 2020, Science Robotics.

[21]  Peter Stone,et al.  Policy gradient reinforcement learning for fast quadrupedal locomotion , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.

[22]  Xingye Da,et al.  Dynamics Randomization Revisited: A Case Study for Quadrupedal Locomotion , 2020, 2021 IEEE International Conference on Robotics and Automation (ICRA).

[23]  Xi Chen,et al.  Evolution Strategies as a Scalable Alternative to Reinforcement Learning , 2017, ArXiv.

[24]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[25]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[26]  D. Broomhead,et al.  Radial Basis Functions, Multi-Variable Functional Interpolation and Adaptive Networks , 1988 .

[27]  Xiaodong Wu,et al.  CPG-based control of serpentine locomotion of a snake-like robot ☆ , 2010 .

[28]  Auke Jan Ijspeert,et al.  Controlling swimming and crawling in a fish robot using a central pattern generator , 2008, Auton. Robots.