Physics-based motion capture imitation with deep reinforcement learning

We introduce a deep reinforcement learning method that learns to control articulated humanoid bodies to imitate given target motions closely when simulated in a physics simulator. The target motion, which may not have been seen by the agent and can be noisy, is supplied at runtime. Our method can recover balance from moderate external disturbances and keep imitating the target motion. When subjected to large disturbances that cause the humanoid to fall down, our method can control the character to get up and recover to track the motion. Our method is trained to imitate the mocap clips from the CMU motion capture database and a number of other publicly available databases. We use a state-of-the-art deep reinforcement learning algorithm to learn to dynamically control the gain of PD controllers, whose target angles are derived from the mocap clip and to apply corrective torques with the goal of imitating the provided motion clip as closely as possible. Both the simulation and the learning algorithms are parallelized and run on the GPU. We demonstrate that the proposed method can control the character to imitate a wide variety of motions such as running, walking, dancing, jumping, kicking, punching, standing up, and so on.

[1]  Karl Sims,et al.  Evolving virtual creatures , 1994, SIGGRAPH.

[2]  David C. Brogan,et al.  Animating human athletics , 1995, SIGGRAPH.

[3]  Demetri Terzopoulos,et al.  Automated learning of muscle-actuated locomotion through control abstraction , 1995, SIGGRAPH.

[4]  Eugene Fiume,et al.  Limit cycle control and its application to the animation of balancing and walking , 1996, SIGGRAPH.

[5]  Lucas Kovar,et al.  Motion Graphs , 2002, ACM Trans. Graph..

[6]  Nikolaus Hansen,et al.  The CMA Evolution Strategy: A Comparing Review , 2006, Towards a New Evolutionary Computation.

[7]  Jehee Lee,et al.  Simulating biped behaviors from human motion data , 2007, SIGGRAPH 2007.

[8]  M.A. Wiering,et al.  Reinforcement Learning in Continuous Action Spaces , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.

[9]  M. V. D. Panne,et al.  SIMBICON: simple biped locomotion control , 2007, SIGGRAPH 2007.

[10]  David J. Fleet,et al.  Optimizing walking controllers , 2009, ACM Trans. Graph..

[11]  Philippe Beaudoin,et al.  Robust task-based control policies for physics-based characters , 2009, SIGGRAPH 2009.

[12]  David J. Fleet,et al.  Optimizing walking controllers for uncertain inputs and environments , 2010, ACM Trans. Graph..

[13]  Tianjia Shao,et al.  Sampling-based contact-rich motion control , 2010, SIGGRAPH 2010.

[14]  Jehee Lee,et al.  Data-driven biped control , 2010, SIGGRAPH 2010.

[15]  Taesoo Kwon,et al.  Control systems for human running using an inverted pendulum model and a reference motion capture sequence , 2010, SCA '10.

[16]  Philippe Beaudoin,et al.  Generalized biped walking control , 2010, SIGGRAPH 2010.

[17]  Greg Turk,et al.  Articulated swimming creatures , 2011, SIGGRAPH 2011.

[18]  Nicolas Pronost,et al.  Simple data-driven control for simulated bipeds , 2012, SCA '12.

[19]  Marcelo Kallmann,et al.  An Analysis of Motion Blending Techniques , 2012, MIG.

[20]  Nicolas Pronost,et al.  Interactive Character Animation Using Simulated Physics: A State‐of‐the‐Art Review , 2012, Comput. Graph. Forum.

[21]  Vladlen Koltun,et al.  Optimizing locomotion controllers using biologically-based actuators and objectives , 2012, ACM Trans. Graph..

[22]  Dana H. Ballard,et al.  Realtime, Physics-Based Marker Following , 2012, MIG.

[23]  Aaron Hertzmann,et al.  Trajectory Optimization for Full-Body Movements with Complex Contacts , 2013, IEEE Transactions on Visualization and Computer Graphics.

[24]  Vladlen Koltun,et al.  Animating human lower limbs using contact-invariant optimization , 2013, ACM Trans. Graph..

[25]  Michiel van de Panne,et al.  Flexible muscle-based locomotion for bipedal creatures , 2013, ACM Trans. Graph..

[26]  Sergey Levine,et al.  Learning Complex Neural Network Policies with Trajectory Optimization , 2014, ICML.

[27]  Emanuel Todorov,et al.  Combining the benefits of function approximation and trajectory optimization , 2014, Robotics: Science and Systems.

[28]  Guanbo Bao,et al.  Optimization control for biped motion trajectory , 2014, 2014 International Conference on Audio, Language and Image Processing.

[29]  Sergey Levine,et al.  Learning Neural Network Policies with Guided Policy Search under Unknown Dynamics , 2014, NIPS.

[30]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[31]  Glen Berseth,et al.  Dynamic terrain traversal skills using reinforcement learning , 2015, ACM Trans. Graph..

[32]  Baining Guo,et al.  Improving Sampling‐based Motion Control , 2015, Comput. Graph. Forum.

[33]  Zoran Popovic,et al.  Interactive Control of Diverse Complex Characters with Neural Networks , 2015, NIPS.

[34]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[35]  Tolga K. Çapin,et al.  Style-based biped walking control , 2018, The Visual Computer.

[36]  Libin Liu,et al.  Guided Learning of Control Graphs for Physics-Based Characters , 2016, ACM Trans. Graph..

[37]  Glen Berseth,et al.  Terrain-adaptive locomotion skills using deep reinforcement learning , 2016, ACM Trans. Graph..

[38]  Sergey Levine,et al.  High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.

[39]  Taku Komura,et al.  A Deep Learning Framework for Character Motion Synthesis and Editing , 2016, ACM Trans. Graph..

[40]  Michiel van de Panne,et al.  Learning locomotion skills using DeepRL: does the choice of action space matter? , 2016, Symposium on Computer Animation.

[41]  Sepp Hochreiter,et al.  Self-Normalizing Neural Networks , 2017, NIPS.

[42]  Taku Komura,et al.  Phase-functioned neural networks for character control , 2017, ACM Trans. Graph..

[43]  KangKang Yin,et al.  DeepLoco , 2017 .

[44]  Glen Berseth,et al.  DeepLoco , 2017, ACM Trans. Graph..

[45]  Yuval Tassa,et al.  Emergence of Locomotion Behaviours in Rich Environments , 2017, ArXiv.

[46]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[47]  Yuval Tassa,et al.  Learning human behaviors from motion capture by adversarial imitation , 2017, ArXiv.

[48]  Sergey Levine,et al.  DeepMimic , 2018, ACM Trans. Graph..

[49]  Glen Berseth,et al.  Progressive Reinforcement Learning with Distillation for Multi-Skilled Motion Control , 2018, ICLR.

[50]  S. Levine,et al.  DeepMimic , 2018, ACM Transactions on Graphics.

[51]  Stefan Jeschke,et al.  Non-smooth Newton Methods for Deformable Multi-body Dynamics , 2019, ACM Trans. Graph..