Learning multiple gaits of quadruped robot using hierarchical reinforcement learning

There is a growing interest in learning a velocity command tracking controller of quadruped robot using reinforcement learning due to its robustness and scalability. However, a single policy, trained end-to-end, usually shows a single gait regardless of the command velocity. This could be a suboptimal solution considering the existence of optimal gait according to the velocity for quadruped animals [1], [2]. In this work, we propose a hierarchical controller for quadruped robot that could generate multiple gaits (i.e. pace, trot, bound) while tracking velocity command. Our controller is composed of two policies, each working as a central pattern generator and local feedback controller, and trained with hierarchical reinforcement learning. Experiment results show 1) the existence of optimal gait for specific velocity range 2) the efficiency of our hierarchical controller compared to a controller composed of a single policy, which usually shows a single gait. Codes are publicly available link.

[1]  Glen Berseth,et al.  Progressive Reinforcement Learning with Distillation for Multi-Skilled Motion Control , 2018, ICLR.

[2]  Jason Weston,et al.  Curriculum learning , 2009, ICML '09.

[3]  Hiroki Yamamoto,et al.  Generalization of movements in quadruped robot locomotion by learning specialized motion data , 2020, ROBOMECH Journal.

[4]  M. Reza Emami,et al.  Gait Optimization for Quadruped Rovers , 2019, Robotica.

[5]  Alan Fern,et al.  Sim-to-Real Learning of All Common Bipedal Gaits via Periodic Reward Composition , 2020, 2021 IEEE International Conference on Robotics and Automation (ICRA).

[6]  Wangbo,et al.  Learning Agile and Dynamic Motor Skills for Legged Robots , 2019 .

[7]  Jie Tan,et al.  Learning Agile Robotic Locomotion Skills by Imitating Animals , 2020, RSS 2020.

[8]  Marco Hutter,et al.  Gait and Trajectory Optimization for Legged Systems Through Phase-Based End-Effector Parameterization , 2018, IEEE Robotics and Automation Letters.

[9]  Lorenz Wellhausen,et al.  Learning quadrupedal locomotion over challenging terrain , 2020, Science Robotics.

[10]  Glen Berseth,et al.  DeepLoco , 2017, ACM Trans. Graph..

[11]  W. Marsden I and J , 2012 .

[12]  Sergey Levine,et al.  DeepMimic , 2018, ACM Trans. Graph..

[13]  Bharadwaj S. Amrutur,et al.  Realizing Learned Quadruped Locomotion Behaviors through Kinematic Motion Primitives , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[14]  Yevgeniy Yesilevskiy,et al.  Selecting gaits for economical locomotion of legged robots , 2016, Int. J. Robotics Res..

[15]  Marco Hutter,et al.  Per-Contact Iteration Method for Solving Contact Dynamics , 2018, IEEE Robotics and Automation Letters.

[16]  Yasuhiro Fukuoka,et al.  A simple rule for quadrupedal gait generation determined by leg loading feedback: a modeling study , 2015, Scientific Reports.

[17]  Yuval Tassa,et al.  Learning human behaviors from motion capture by adversarial imitation , 2017, ArXiv.

[18]  Sergey Levine,et al.  AMP: Adversarial Motion Priors for Stylized Physics-Based Character Control , 2021, ACM Trans. Graph..

[19]  Yingying Wang,et al.  Efficient Neural Networks for Real-time Motion Style Transfer , 2019, PACMCGIT.

[20]  Akio Ishiguro,et al.  A Quadruped Robot Exhibiting Spontaneous Gait Transitions from Walking to Trotting to Galloping , 2017, Scientific Reports.

[21]  Taku Komura,et al.  Few‐shot Learning of Homogeneous Human Locomotion Styles , 2018, Comput. Graph. Forum.

[22]  D. Owaki,et al.  Simple robot suggests physical interlimb communication is essential for quadruped walking , 2013, Journal of The Royal Society Interface.

[23]  D. F. Hoyt,et al.  Gait and the energetics of locomotion in horses , 1981, Nature.

[24]  Atil Iscen,et al.  Sim-to-Real: Learning Agile Locomotion For Quadruped Robots , 2018, Robotics: Science and Systems.

[25]  Kemal Leblebicioglu,et al.  Free gait generation with reinforcement learning for a six-legged robot , 2008, Robotics Auton. Syst..

[26]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[27]  Joonho Lee,et al.  Learning agile and dynamic motor skills for legged robots , 2019, Science Robotics.