AMP: Adversarial Motion Priors for Stylized Physics-Based Character Control

Synthesizing graceful and life-like behaviors for physically simulated characters has been a fundamental challenge in computer animation. Data-driven methods that leverage motion tracking are a prominent class of techniques for producing high fidelity motions for a wide range of behaviors. However, the effectiveness of these tracking-based methods often hinges on carefully designed objective functions, and when applied to large and diverse motion datasets, these methods require significant additional machinery to select the appropriate motion for the character to track in a given scenario. In this work, we propose to obviate the need to manually design imitation objectives and mechanisms for motion selection by utilizing a fully automated approach based on adversarial imitation learning. High-level task objectives that the character should perform can be specified by relatively simple reward functions, while the low-level style of the character’s behaviors can be specified by a dataset of unstructured motion clips, without any explicit clip selection or sequencing. For example, a character traversing an obstacle course might utilize a task-reward that only considers forward progress, while the dataset contains clips of relevant behaviors such as running, jumping, and rolling. These motion clips are used to train an adversarial motion prior, which specifies style-rewards for training the character through reinforcement learning (RL). The adversarial RL procedure automatically selects which motion to perform, dynamically interpolating and generalizing from the dataset. Our

[1]  Sergey Levine,et al.  Space-time planning with parameterized locomotion controllers , 2011, TOGS.

[2]  Glen Berseth,et al.  Terrain-adaptive locomotion skills using deep reinforcement learning , 2016, ACM Trans. Graph..

[3]  Eduardo F. Morales,et al.  An Introduction to Reinforcement Learning , 2011 .

[4]  Michiel van de Panne,et al.  Synthesis of Controllers for Stylized Planar Bipedal Walking , 2005, Proceedings of the 2005 IEEE International Conference on Robotics and Automation.

[5]  Peter Stone,et al.  Generative Adversarial Imitation from Observation , 2018, ArXiv.

[6]  Zoran Popovic,et al.  Motion fields for interactive character locomotion , 2010, CACM.

[7]  Sergey Levine,et al.  Learning Latent Plans from Play , 2019, CoRL.

[8]  Sergey Levine,et al.  High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.

[9]  Jessica K. Hodgins,et al.  Construction and optimal search of interpolated motion graphs , 2007, ACM Trans. Graph..

[10]  Vladlen Koltun,et al.  Animating human lower limbs using contact-invariant optimization , 2013, ACM Trans. Graph..

[11]  Sergey Levine,et al.  DeepMimic , 2018, ACM Trans. Graph..

[12]  Raymond Y. K. Lau,et al.  Least Squares Generative Adversarial Networks , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[13]  Kwang Won Sok,et al.  Simulating biped behaviors from human motion data , 2007, ACM Trans. Graph..

[14]  Trista Pei-chun Chen,et al.  CARL , 2020, ACM Trans. Graph..

[15]  Aaron Hertzmann,et al.  Trajectory Optimization for Full-Body Movements with Complex Contacts , 2013, IEEE Transactions on Visualization and Computer Graphics.

[16]  Xin Zhang,et al.  End to End Learning for Self-Driving Cars , 2016, ArXiv.

[17]  Sebastian Nowozin,et al.  Which Training Methods for GANs do actually Converge? , 2018, ICML.

[18]  F. Sebastian Grassia,et al.  Practical Parameterization of Rotations Using the Exponential Map , 1998, J. Graphics, GPU, & Game Tools.

[19]  Jessica K. Hodgins,et al.  Interactive control of avatars animated with human motion data , 2002, SIGGRAPH.

[20]  Jitendra Malik,et al.  End-to-End Recovery of Human Shape and Pose , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[21]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[22]  Yuval Tassa,et al.  Learning human behaviors from motion capture by adversarial imitation , 2017, ArXiv.

[23]  Yuval Tassa,et al.  Learning and Transfer of Modulated Locomotor Controllers , 2016, ArXiv.

[24]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[25]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[26]  Michiel van de Panne,et al.  Virtual Wind-up Toys for Animation , 1993 .

[27]  Jitendra Malik,et al.  SFV , 2018, ACM Trans. Graph..

[28]  Tomomasa Sato,et al.  Quantitative evaluation method for pose and motion similarity based on human perception , 2004, 4th IEEE/RAS International Conference on Humanoid Robots, 2004..

[29]  Sergey Levine,et al.  MCP: Learning Composable Hierarchical Control with Multiplicative Compositional Policies , 2019, NeurIPS.

[30]  Sebastian Nowozin,et al.  f-GAN: Training Generative Neural Samplers using Variational Divergence Minimization , 2016, NIPS.

[31]  Jessica K. Hodgins,et al.  Motion capture-driven simulations that hit and react , 2002, SCA '02.

[32]  Stefano Ermon,et al.  Generative Adversarial Imitation Learning , 2016, NIPS.

[33]  Sunmin Lee,et al.  Learning predict-and-simulate policies from unorganized human motion data , 2019, ACM Trans. Graph..

[34]  Sergey Levine,et al.  Latent Space Policies for Hierarchical Reinforcement Learning , 2018, ICML.

[35]  Jovan Popovic,et al.  Simulation of Human Motion Data using Short‐Horizon Model‐Predictive Control , 2008, Comput. Graph. Forum.

[36]  Libin Liu,et al.  Guided Learning of Control Graphs for Physics-Based Characters , 2016, ACM Trans. Graph..

[37]  Zoran Popovic,et al.  Discovery of complex behaviors through contact-invariant optimization , 2012, ACM Trans. Graph..

[38]  Zoran Popovic,et al.  Contact-aware nonlinear control of dynamic characters , 2009, ACM Trans. Graph..

[39]  Yuval Tassa,et al.  DeepMind Control Suite , 2018, ArXiv.

[40]  Nando de Freitas,et al.  Robust Imitation of Diverse Behaviors , 2017, NIPS.

[41]  Jessica K. Hodgins,et al.  Animation of dynamic legged locomotion , 1991, SIGGRAPH.

[42]  Taku Komura,et al.  Emulating human perception of motion similarity , 2008, Comput. Animat. Virtual Worlds.

[43]  Taku Komura,et al.  Phase-functioned neural networks for character control , 2017, ACM Trans. Graph..

[44]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[45]  Zoran Popovic,et al.  Generalizing locomotion style to new animals with inverse optimal regression , 2014, ACM Trans. Graph..

[46]  C. Karen Liu,et al.  Learning symmetric and low-energy locomotion , 2018, ACM Trans. Graph..

[47]  Andrew Y. Ng,et al.  Learning omnidirectional path following using dimensionality reduction , 2007, Robotics: Science and Systems.

[48]  Anind K. Dey,et al.  Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.

[49]  N. Heess,et al.  Catch & Carry: Reusable Neural Controllers for Vision-Guided Whole-Body Tasks , 2019 .

[50]  Michiel van de Panne,et al.  Flexible muscle-based locomotion for bipedal creatures , 2013, ACM Trans. Graph..

[51]  Jaakko Lehtinen,et al.  Progressive Growing of GANs for Improved Quality, Stability, and Variation , 2017, ICLR.

[52]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[53]  Stefan Jeschke,et al.  Physics-based motion capture imitation with deep reinforcement learning , 2018, MIG.

[54]  Jehee Lee,et al.  Interactive character animation by learning multi-objective control , 2018, ACM Trans. Graph..

[55]  Glen Berseth,et al.  DeepLoco , 2017, ACM Trans. Graph..

[56]  Yoonsang Lee,et al.  Data-driven biped control , 2010, ACM Trans. Graph..

[57]  Z. Popovic,et al.  Near-optimal character animation with continuous control , 2007, ACM Trans. Graph..

[58]  Sergey Levine,et al.  Continuous character control with low-dimensional embeddings , 2012, ACM Trans. Graph..

[59]  Siddhartha Srinivasa,et al.  Imitation Learning as f-Divergence Minimization , 2019, WAFR.

[60]  Pieter Abbeel,et al.  Stochastic Neural Networks for Hierarchical Reinforcement Learning , 2016, ICLR.

[61]  Dean Pomerleau,et al.  ALVINN, an autonomous land vehicle in a neural network , 2015 .

[62]  Sebastian Starke,et al.  Neural state machine for character-scene interactions , 2019, ACM Trans. Graph..

[63]  Aaron Hertzmann,et al.  Style-based inverse kinematics , 2004, ACM Trans. Graph..

[64]  Karol Hausman,et al.  Learning an Embedding Space for Transferable Robot Skills , 2018, ICLR.

[65]  Taesoo Kwon,et al.  Momentum-Mapped Inverted Pendulum Models for Controlling Dynamic Human Motions , 2017, ACM Trans. Graph..

[66]  Geoffrey J. Gordon,et al.  A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[67]  S. Chiba,et al.  Dynamic programming algorithm optimization for spoken word recognition , 1978 .

[68]  M. V. D. Panne,et al.  Sampling-based contact-rich motion control , 2010, ACM Trans. Graph..

[69]  Kyoungmin Lee,et al.  Scalable muscle-actuated human simulation and control , 2019, ACM Trans. Graph..

[70]  Eugene Wong,et al.  Stochastic neural networks , 2009, Algorithmica.

[71]  Jungdam Won,et al.  A scalable approach to control diverse behaviors for physically simulated characters , 2020, ACM Trans. Graph..

[72]  Yee Whye Teh,et al.  Neural probabilistic motor primitives for humanoid control , 2018, ICLR.

[73]  C. Karen Liu,et al.  Synthesis of biologically realistic human motion using joint torque actuation , 2019, ACM Trans. Graph..

[74]  Aaron C. Courville,et al.  Improved Training of Wasserstein GANs , 2017, NIPS.

[75]  Michiel van de Panne,et al.  Task-based locomotion , 2016, ACM Trans. Graph..

[76]  C. Karen Liu,et al.  Synthesis of Responsive Motion Using a Dynamic Model , 2010, Comput. Graph. Forum.

[77]  C. Karen Liu,et al.  Learning bicycle stunts , 2014, ACM Trans. Graph..

[78]  Wojciech Zaremba,et al.  Improved Techniques for Training GANs , 2016, NIPS.

[79]  Taku Komura,et al.  Mode-adaptive neural networks for quadruped motion control , 2018, ACM Trans. Graph..

[80]  Vladlen Koltun,et al.  Optimizing locomotion controllers using biologically-based actuators and objectives , 2012, ACM Trans. Graph..

[81]  David J. Fleet,et al.  Optimizing walking controllers , 2009, ACM Trans. Graph..

[82]  David Berthelot,et al.  BEGAN: Boundary Equilibrium Generative Adversarial Networks , 2017, ArXiv.

[83]  Michiel van de Panne,et al.  Character controllers using motion VAEs , 2020, ACM Trans. Graph..