Learning Agile Robotic Locomotion Skills by Imitating Animals

Reproducing the diverse and agile locomotion skills of animals has been a longstanding challenge in robotics. While manually-designed controllers have been able to emulate many complex behaviors, building such controllers involves a time-consuming and difficult development process, often requiring substantial expertise of the nuances of each skill. Reinforcement learning provides an appealing alternative for automating the manual effort involved in the development of controllers. However, designing learning objectives that elicit the desired behaviors from an agent can also require a great deal of skill-specific expertise. In this work, we present an imitation learning system that enables legged robots to learn agile locomotion skills by imitating real-world animals. We show that by leveraging reference motion data, a single learning-based approach is able to automatically synthesize controllers for a diverse repertoire behaviors for legged robots. By incorporating sample efficient domain adaptation techniques into the training process, our system is able to learn adaptive policies in simulation that can then be quickly adapted for real-world deployment. To demonstrate the effectiveness of our system, we train an 18-DoF quadruped robot to perform a variety of agile behaviors ranging from different locomotion gaits to dynamic hops and turns.

[1]  Emanuel Todorov,et al.  Reinforcement learning for non-prehensile manipulation: Transfer from simulation to physical system , 2018, 2018 IEEE International Conference on Simulation, Modeling, and Programming for Autonomous Robots (SIMPAR).

[2]  C. Karen Liu,et al.  Sim-to-Real Transfer for Biped Locomotion , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[3]  Daniel E. Koditschek,et al.  Spring loaded inverted pendulum running: a plant model , 1998 .

[4]  M. van de Panne,et al.  Generalized biped walking control , 2010, ACM Trans. Graph..

[5]  C. Karen Liu,et al.  Policy Transfer with Strategy Optimization , 2018, ICLR.

[6]  Jun Morimoto,et al.  Learning CPG Sensory Feedback with Policy Gradient for Biped Locomotion for a Full-Body Humanoid , 2005, AAAI.

[7]  Sangbae Kim,et al.  MIT Cheetah 3: Design and Control of a Robust, Dynamic Quadruped Robot , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[8]  Philippe Beaudoin,et al.  Robust task-based control policies for physics-based characters , 2009, ACM Trans. Graph..

[9]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[10]  Peter Stone,et al.  Policy gradient reinforcement learning for fast quadrupedal locomotion , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.

[11]  Eduardo F. Morales,et al.  An Introduction to Reinforcement Learning , 2011 .

[12]  Katsu Yamane,et al.  Controlling humanoid robots with human motion data: Experimental validation , 2010, 2010 10th IEEE-RAS International Conference on Humanoid Robots.

[13]  Peter Stone,et al.  Stochastic Grounded Action Transformation for Robot Learning in Simulation , 2017, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[14]  Yevgen Chebotar,et al.  Closing the Sim-to-Real Loop: Adapting Simulation Randomization with Real World Experience , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[15]  M H Raibert,et al.  Trotting, pacing and bounding by a quadruped robot. , 1990, Journal of biomechanics.

[16]  Michael Gleicher,et al.  Retargetting motion to new characters , 1998, SIGGRAPH.

[17]  Sehoon Ha,et al.  Learning Fast Adaptation With Meta Strategy Optimization , 2020, IEEE Robotics and Automation Letters.

[18]  Michiel van de Panne,et al.  Learning Locomotion Skills for Cassie: Iterative Design and Sim-to-Real , 2019, CoRL.

[19]  Roy Featherstone,et al.  Rigid Body Dynamics Algorithms , 2007 .

[20]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[21]  Sangbae Kim,et al.  Dynamic Locomotion in the MIT Cheetah 3 Through Convex Model-Predictive Control , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[22]  Rajesh P. N. Rao,et al.  Dynamic Imitation in a Humanoid Robot through Nonparametric Probabilistic Inference , 2006, Robotics: Science and Systems.

[23]  Marc H. Raibert,et al.  Hopping in legged systems — Modeling and simulation for the two-dimensional one-legged case , 1984, IEEE Transactions on Systems, Man, and Cybernetics.

[24]  Taylor Apgar,et al.  Fast Online Trajectory Optimization for the Bipedal Robot Cassie , 2018, Robotics: Science and Systems.

[25]  Libin Liu,et al.  Guided Learning of Control Graphs for Physics-Based Characters , 2016, ACM Trans. Graph..

[26]  Abhinav Gupta,et al.  Robust Adversarial Reinforcement Learning , 2017, ICML.

[27]  Roland Siegwart,et al.  Practice Makes Perfect: An Optimization-Based Approach to Controlling Agile Motions for a Quadruped Robot , 2016, IEEE Robotics & Automation Magazine.

[28]  I. Shimoyama,et al.  Dynamic Walk of a Biped , 1984 .

[29]  Sergey Levine,et al.  Learning to Adapt in Dynamic, Real-World Environments through Meta-Reinforcement Learning , 2018, ICLR.

[30]  Atil Iscen,et al.  Sim-to-Real: Learning Agile Locomotion For Quadruped Robots , 2018, Robotics: Science and Systems.

[31]  Marcin Andrychowicz,et al.  Sim-to-Real Transfer of Robotic Control with Dynamics Randomization , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[32]  Yoonsang Lee,et al.  Data-driven biped control , 2010, ACM Trans. Graph..

[33]  Kazuhito Yokoi,et al.  Generating whole body motions for a biped humanoid robot from captured human dances , 2003, 2003 IEEE International Conference on Robotics and Automation (Cat. No.03CH37422).

[34]  Byron Boots,et al.  Simulation-based design of dynamic controllers for humanoid balancing , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[35]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[36]  Sergey Levine,et al.  Advantage-Weighted Regression: Simple and Scalable Off-Policy Reinforcement Learning , 2019, ArXiv.

[37]  Gaurav S. Sukhatme,et al.  Zero-Shot Skill Composition and Simulation-to-Real Transfer by Learning Task Representations , 2018, ArXiv.

[38]  H. Sebastian Seung,et al.  Stochastic policy gradient reinforcement learning on a simple 3D biped , 2004, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566).

[39]  Sergey Levine,et al.  Learning to Walk via Deep Reinforcement Learning , 2018, Robotics: Science and Systems.

[40]  Maren Bennewitz,et al.  Real-time imitation of human whole-body motions by humanoids , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[41]  Peter L. Bartlett,et al.  RL$^2$: Fast Reinforcement Learning via Slow Reinforcement Learning , 2016, ArXiv.

[42]  Joonho Lee,et al.  Learning agile and dynamic motor skills for legged robots , 2019, Science Robotics.

[43]  Sergey Levine,et al.  DeepMimic , 2018, ACM Trans. Graph..

[44]  Reinhard Blickhan,et al.  Positive force feedback in bouncing gaits? , 2003, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[45]  ChangHwan Kim,et al.  Stable whole-body motion generation for humanoid robots to imitate human motions , 2009, 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[46]  Eiichi Yoshida,et al.  On human motion imitation by humanoid robot , 2008, 2008 IEEE International Conference on Robotics and Automation.

[47]  Christopher G. Atkeson,et al.  Adapting human motion for the control of a humanoid robot , 2002, Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292).

[48]  Taku Komura,et al.  Mode-adaptive neural networks for quadruped motion control , 2018, ACM Trans. Graph..

[49]  Martin de Lasa,et al.  Feature-based locomotion controllers , 2010, ACM Trans. Graph..

[50]  Jan Peters,et al.  Fitted Q-iteration by Advantage Weighted Regression , 2008, NIPS.

[51]  Glen Berseth,et al.  Terrain-adaptive locomotion skills using deep reinforcement learning , 2016, ACM Trans. Graph..

[52]  Jitendra Malik,et al.  SFV , 2018, ACM Trans. Graph..

[53]  Alexander A. Alemi,et al.  Deep Variational Information Bottleneck , 2017, ICLR.

[54]  KangKang Yin,et al.  SIMBICON: simple biped locomotion control , 2007, ACM Trans. Graph..

[55]  Libin Liu,et al.  Learning basketball dribbling skills using trajectory optimization and deep reinforcement learning , 2018, ACM Trans. Graph..

[56]  Philippe Beaudoin,et al.  Robust task-based control policies for physics-based characters , 2009, SIGGRAPH 2009.

[57]  Wojciech Zaremba,et al.  Domain randomization for transferring deep neural networks from simulation to the real world , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[58]  Zoran Popovic,et al.  Contact-aware nonlinear control of dynamic characters , 2009, ACM Trans. Graph..

[59]  Yuval Tassa,et al.  Emergence of Locomotion Behaviours in Rich Environments , 2017, ArXiv.

[60]  Kyoungmin Lee,et al.  Scalable muscle-actuated human simulation and control , 2019, ACM Trans. Graph..

[61]  Jakub W. Pachocki,et al.  Learning dexterous in-hand manipulation , 2018, Int. J. Robotics Res..

[62]  Sergey Levine,et al.  (CAD)$^2$RL: Real Single-Image Flight without a Single Real Image , 2016, Robotics: Science and Systems.

[63]  A. Karpathy,et al.  Locomotion skills for simulated quadrupeds , 2011, ACM Trans. Graph..

[64]  Razvan Pascanu,et al.  Sim-to-Real Robot Learning from Pixels with Progressive Nets , 2016, CoRL.

[65]  Ambarish Goswami,et al.  Foot rotation indicator (FRI) point: a new gait planning tool to evaluate postural stability of biped robots , 1999, Proceedings 1999 IEEE International Conference on Robotics and Automation (Cat. No.99CH36288C).