Learning Coordinated Terrain-Adaptive Locomotion by Imitating a Centroidal Dynamics Planner

Dynamic quadruped locomotion over challenging terrains with precise foot placements is a hard problem for both optimal control methods and Reinforcement Learning (RL). Non-linear solvers can produce coordinated constraint satisfying motions, but often take too long to converge for online application. RL methods can learn dynamic reactive controllers but require carefully tuned shaping rewards to produce good gaits and can have trouble discovering precise coordinated movements. Imitation learning circumvents this problem and has been used with motion capture data to extract quadruped gaits for flat terrains. However, it would be costly to acquire motion capture data for a very large variety of terrains with height differences. In this work, we combine the advantages of trajectory optimization and learning methods and show that terrain adaptive controllers can be obtained by training policies to imitate trajectories that have been planned over procedural terrains by a non-linear solver. We show that the learned policies transfer to unseen terrains and can be fine-tuned to dynamically traverse challenging terrains that require precise foot placements and are very hard to solve with standard RL.

[1]  Michiel van de Panne,et al.  ALLSTEPS: Curriculum‐driven Learning of Stepping Stone Skills , 2020, Comput. Graph. Forum.

[2]  Stefan Schaal,et al.  Fast, robust quadruped locomotion over challenging terrain , 2010, 2010 IEEE International Conference on Robotics and Automation.

[3]  Raia Hadsell,et al.  CoMic: Complementary Task Learning & Mimicry for Reusable Skills , 2020, ICML.

[4]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[5]  Sergey Levine,et al.  DeepMimic , 2018, ACM Trans. Graph..

[6]  Yuval Tassa,et al.  MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[7]  H. Francis Song,et al.  V-MPO: On-Policy Maximum a Posteriori Policy Optimization for Discrete and Continuous Control , 2019, ICLR.

[8]  Jie Tan,et al.  Learning Agile Robotic Locomotion Skills by Imitating Animals , 2020, RSS 2020.

[9]  Ioannis Havoutis,et al.  RLOC: Terrain-Aware Legged Locomotion using Reinforcement Learning and Optimal Control , 2020, ArXiv.

[10]  Mete Kalyoncu,et al.  Inverse Kinematic Analysis Of A Quadruped Robot , 2017 .

[11]  Marco Hutter,et al.  Gait and Trajectory Optimization for Legged Systems Through Phase-Based End-Effector Parameterization , 2018, IEEE Robotics and Automation Letters.

[12]  Lorenz Wellhausen,et al.  Learning quadrupedal locomotion over challenging terrain , 2020, Science Robotics.

[13]  Taku Komura,et al.  Phase-functioned neural networks for character control , 2017, ACM Trans. Graph..

[14]  Yuval Tassa,et al.  Emergence of Locomotion Behaviours in Rich Environments , 2017, ArXiv.

[15]  WächterAndreas,et al.  On the implementation of an interior-point filter line-search algorithm for large-scale nonlinear programming , 2006 .

[16]  Sepp Hochreiter,et al.  Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) , 2015, ICLR.

[17]  Marco Hutter,et al.  Dynamic Locomotion Through Online Nonlinear Motion Optimization for Quadrupedal Robots , 2018, IEEE Robotics and Automation Letters.

[18]  Alan Fern,et al.  Blind Bipedal Stair Traversal via Sim-to-Real Reinforcement Learning , 2021, Robotics: Science and Systems.

[19]  H. Francis Song,et al.  A Distributional View on Multi-Objective Policy Optimization , 2020, ICML.

[20]  Marco Hutter,et al.  MPC-Net: A First Principles Guided Policy Search , 2020, IEEE Robotics and Automation Letters.

[21]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[22]  Alexander Mitchell,et al.  Guided Constrained Policy Optimization for Dynamic Quadrupedal Robot Locomotion , 2020, IEEE Robotics and Automation Letters.

[23]  Xingye Da,et al.  GLiDE: Generalizable Quadrupedal Locomotion in Diverse Environments with a Centroidal Model , 2021, ArXiv.

[24]  Joonho Lee,et al.  Learning agile and dynamic motor skills for legged robots , 2019, Science Robotics.

[25]  Marco Hutter,et al.  Learning to Walk in Minutes Using Massively Parallel Deep Reinforcement Learning , 2021, ArXiv.

[26]  Yee Whye Teh,et al.  Neural probabilistic motor primitives for humanoid control , 2018, ICLR.

[27]  David Surovik,et al.  Reliable Trajectories for Dynamic Quadrupeds using Analytical Costs and Learned Initializations , 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[28]  Ludovic Righetti,et al.  Model-free Reinforcement Learning for Robust Locomotion Using Trajectory Optimization for Exploration , 2021, ArXiv.

[29]  Sham M. Kakade,et al.  Plan Online, Learn Offline: Efficient Learning and Exploration via Model-Based Control , 2018, ICLR.

[30]  Peter Fankhauser,et al.  ANYmal - a highly mobile and dynamic quadrupedal robot , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[31]  Atil Iscen,et al.  Zero-Shot Terrain Generalization for Visual Locomotion Policies , 2020, ArXiv.

[32]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[33]  Sergey Levine,et al.  AMP: Adversarial Motion Priors for Stylized Physics-Based Character Control , 2021, ACM Trans. Graph..

[34]  Sergey Levine,et al.  Guided Policy Search , 2013, ICML.

[35]  Joonho Lee,et al.  DeepGait: Planning and Control of Quadrupedal Gaits Using Deep Reinforcement Learning , 2020, IEEE Robotics and Automation Letters.

[36]  Max Q.-H. Meng,et al.  Reinforcement Learning with Evolutionary Trajectory Generator: A General Approach for Quadrupedal Locomotion , 2021, IEEE Robotics and Automation Letters.

[37]  Taesoo Kwon,et al.  Fast and flexible multilegged locomotion using learned centroidal dynamics , 2020, ACM Trans. Graph..