Learning symmetric and low-energy locomotion

Learning locomotion skills is a challenging problem. To generate realistic and smooth locomotion, existing methods use motion capture, finite state machines or morphology-specific knowledge to guide the motion generation algorithms. Deep reinforcement learning (DRL) is a promising approach for the automatic creation of locomotion control. Indeed, a standard benchmark for DRL is to automatically create a running controller for a biped character from a simple reward function [Duan et al. 2016]. Although several different DRL algorithms can successfully create a running controller, the resulting motions usually look nothing like a real runner. This paper takes a minimalist learning approach to the locomotion problem, without the use of motion examples, finite state machines, or morphology-specific knowledge. We introduce two modifications to the DRL approach that, when used together, produce locomotion behaviors that are symmetric, low-energy, and much closer to that of a real person. First, we introduce a new term to the loss function (not the reward function) that encourages symmetric actions. Second, we introduce a new curriculum learning method that provides modulated physical assistance to help the character with left/right balance and forward movement. The algorithm automatically computes appropriate assistance to the character and gradually relaxes this assistance, so that eventually the character learns to move entirely without help. Because our method does not make use of motion capture data, it can be applied to a variety of character morphologies. We demonstrate locomotion controllers for the lower half of a biped, a full humanoid, a quadruped, and a hexapod. Our results show that learned policies are able to produce symmetric, low-energy gaits. In addition, speed-appropriate gait patterns emerge without any guidance from motion examples or contact planning.

[1]  B. Nigg,et al.  Use of force platform variables to quantify the effects of chiropractic manipulation on gait symmetry. , 1987, Journal of manipulative and physiological therapeutics.

[2]  B. Nigg,et al.  Asymmetries in ground reaction force patterns in normal human gait. , 1989, Medicine and science in sports and exercise.

[3]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .

[4]  Michiel van de Panne,et al.  Guided Optimization for Balanced Locomotion , 1995 .

[5]  David C. Brogan,et al.  Animating human athletics , 1995, SIGGRAPH.

[6]  Nikolaus Hansen,et al.  Adapting arbitrary normal mutation distributions in evolution strategies: the covariance matrix adaptation , 1996, Proceedings of IEEE International Conference on Evolutionary Computation.

[7]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[8]  C. K. Liu,et al.  Learning physics-based motion style with nonlinear inverse optimization , 2005, SIGGRAPH 2005.

[9]  Kwang Won Sok,et al.  Simulating biped behaviors from human motion data , 2007, ACM Trans. Graph..

[10]  M. V. D. Panne,et al.  SIMBICON: simple biped locomotion control , 2007, SIGGRAPH 2007.

[11]  Philippe Beaudoin,et al.  Continuation methods for adapting simulated skills , 2008, SIGGRAPH 2008.

[12]  Philippe Beaudoin,et al.  Continuation methods for adapting simulated skills , 2008, ACM Trans. Graph..

[13]  Marco da Silva,et al.  Interactive simulation of stylized human locomotion , 2008, ACM Trans. Graph..

[14]  Kara K. Patterson,et al.  Gait asymmetry in community-ambulating stroke survivors. , 2008, Archives of physical medicine and rehabilitation.

[15]  David J. Fleet,et al.  Optimizing walking controllers , 2009, SIGGRAPH 2009.

[16]  Zoran Popovic,et al.  Optimal gait and form for animal locomotion , 2009, ACM Trans. Graph..

[17]  C. Karen Liu,et al.  Optimization-based interactive motion synthesis , 2009, ACM Trans. Graph..

[18]  Jason Weston,et al.  Curriculum learning , 2009, ICML '09.

[19]  Zoran Popović,et al.  Contact-aware nonlinear control of dynamic characters , 2009, SIGGRAPH 2009.

[20]  M. van de Panne,et al.  Generalized biped walking control , 2010, ACM Trans. Graph..

[21]  Zoran Popović,et al.  Terrain-adaptive bipedal locomotion control , 2010, SIGGRAPH 2010.

[22]  C. Karen Liu,et al.  Optimal feedback control for character animation using an abstract model , 2010, SIGGRAPH 2010.

[23]  Jehee Lee,et al.  Data-driven biped control , 2010, SIGGRAPH 2010.

[24]  Aaron Hertzmann,et al.  Feature-based locomotion controllers , 2010, SIGGRAPH 2010.

[25]  A. Karpathy,et al.  Locomotion skills for simulated quadrupeds , 2011, SIGGRAPH 2011.

[26]  C. Karen Liu,et al.  Stable Proportional-Derivative Controllers , 2011, IEEE Computer Graphics and Applications.

[27]  Michiel van de Panne,et al.  Curriculum Learning for Motor Skills , 2012, Canadian Conference on AI.

[28]  Zoran Popovic,et al.  Discovery of complex behaviors through contact-invariant optimization , 2012, ACM Trans. Graph..

[29]  Nicolas Pronost,et al.  Interactive Character Animation Using Simulated Physics: A State‐of‐the‐Art Review , 2012, Comput. Graph. Forum.

[30]  Vladlen Koltun,et al.  Optimizing locomotion controllers using biologically-based actuators and objectives , 2012, ACM Trans. Graph..

[31]  Aaron Hertzmann,et al.  Trajectory Optimization for Full-Body Movements with Complex Contacts , 2013, IEEE Transactions on Visualization and Computer Graphics.

[32]  Vladlen Koltun,et al.  Animating human lower limbs using contact-invariant optimization , 2013, ACM Trans. Graph..

[33]  Michiel van de Panne,et al.  Flexible muscle-based locomotion for bipedal creatures , 2013, ACM Trans. Graph..

[34]  Sergey Levine,et al.  Learning Complex Neural Network Policies with Trajectory Optimization , 2014, ICML.

[35]  Zoran Popovic,et al.  Generalizing locomotion style to new animals with inverse optimal regression , 2014, ACM Trans. Graph..

[36]  Sehoon Ha,et al.  Iterative Training of Dynamic Skills Inspired by Human Coaching Techniques , 2014, ACM Trans. Graph..

[37]  Taesoo Kwon,et al.  Locomotion control for many-muscle humanoids , 2014, ACM Trans. Graph..

[38]  C. Karen Liu,et al.  Learning bicycle stunts , 2014, ACM Trans. Graph..

[39]  Christoph H. Lampert,et al.  Curriculum learning of multiple tasks , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[41]  Glen Berseth,et al.  Dynamic terrain traversal skills using reinforcement learning , 2015, ACM Trans. Graph..

[42]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[43]  Zoran Popovic,et al.  Interactive Control of Diverse Complex Characters with Neural Networks , 2015, NIPS.

[44]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[45]  Pieter Abbeel,et al.  Benchmarking Deep Reinforcement Learning for Continuous Control , 2016, ICML.

[46]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[47]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[48]  Peter Stone,et al.  Source Task Creation for Curriculum Learning , 2016, AAMAS.

[49]  Libin Liu,et al.  Guided Learning of Control Graphs for Physics-Based Characters , 2016, ACM Trans. Graph..

[50]  Glen Berseth,et al.  Terrain-adaptive locomotion skills using deep reinforcement learning , 2016, ACM Trans. Graph..

[51]  Katja D. Mombaur,et al.  Synthesis of full-body 3-D human gait using optimal control methods , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[52]  Sergey Levine,et al.  High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.

[53]  Abhinav Gupta,et al.  Supersizing self-supervision: Learning to grasp from 50K tries and 700 robot hours , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[54]  Wojciech Zaremba,et al.  OpenAI Gym , 2016, ArXiv.

[55]  Learning to Schedule Control Fragments for Physics-Based Characters Using Deep Q-Learning , 2017, ACM Trans. Graph..

[56]  Alex Graves,et al.  Automated Curriculum Learning for Neural Networks , 2017, ICML.

[57]  Jungdam Won,et al.  How to train your dragon , 2017, ACM Trans. Graph..

[58]  KangKang Yin,et al.  DeepLoco , 2017 .

[59]  Pieter Abbeel,et al.  Reverse Curriculum Generation for Reinforcement Learning , 2017, CoRL.

[60]  Glen Berseth,et al.  DeepLoco , 2017, ACM Trans. Graph..

[61]  Yuval Tassa,et al.  Emergence of Locomotion Behaviours in Rich Environments , 2017, ArXiv.

[62]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[63]  Pieter Abbeel,et al.  Automatic Goal Generation for Reinforcement Learning Agents , 2017, ICML.

[64]  John Schulman,et al.  Teacher–Student Curriculum Learning , 2017, IEEE Transactions on Neural Networks and Learning Systems.