论文信息 - DeepMimic

DeepMimic

A longstanding goal in character animation is to combine data-driven specification of behavior with a system that can execute a similar behavior in a physical simulation, thus enabling realistic responses to perturbations and environmental variation. We show that well-known reinforcement learning (RL) methods can be adapted to learn robust control policies capable of imitating a broad range of example motion clips, while also learning complex recoveries, adapting to changes in morphology, and accomplishing user-specified goals. Our method handles keyframed motions, highly-dynamic actions such as motion-captured flips and spins, and retargeted motions. By combining a motion-imitation objective with a task objective, we can train characters that react intelligently in interactive settings, e.g., by walking in a desired direction or throwing a ball at a user-specified target. This approach thus combines the convenience and motion quality of using motion clips to define the desired style and appearance, with the flexibility and generality afforded by RL methods and physics-based animation. We further explore a number of methods for integrating multiple clips into the learning process to develop multi-skilled agents capable of performing a rich repertoire of diverse skills. We demonstrate results using multiple characters (human, Atlas robot, bipedal dinosaur, dragon) and a large variety of skills, including locomotion, acrobatics, and martial arts.

[1] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .

[2] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[3] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[4] Michiel van de Panne,et al. Synthesis of Controllers for Stylized Planar Bipedal Walking , 2005, Proceedings of the 2005 IEEE International Conference on Robotics and Automation.

[5] KangKang Yin,et al. SIMBICON: simple biped locomotion control , 2007, ACM Trans. Graph..

[6] Kwang Won Sok,et al. Simulating biped behaviors from human motion data , 2007, ACM Trans. Graph..

[7] Jessica K. Hodgins,et al. Construction and optimal search of interpolated motion graphs , 2007, ACM Trans. Graph..

[8] Jovan Popovic,et al. Simulation of Human Motion Data using Short‐Horizon Model‐Predictive Control , 2008, Comput. Graph. Forum.

[9] Philippe Beaudoin,et al. Robust task-based control policies for physics-based characters , 2009, ACM Trans. Graph..

[10] Zoran Popovic,et al. Contact-aware nonlinear control of dynamic characters , 2009, ACM Trans. Graph..

[11] M. van de Panne,et al. Generalized biped walking control , 2010, ACM Trans. Graph..

[12] Yoonsang Lee,et al. Data-driven biped control , 2010, ACM Trans. Graph..

[13] M. V. D. Panne,et al. Sampling-based contact-rich motion control , 2010, ACM Trans. Graph..

[14] C. Karen Liu,et al. Synthesis of Responsive Motion Using a Dynamic Model , 2010, Comput. Graph. Forum.

[15] C. K. Liu,et al. Optimal feedback control for character animation using an abstract model , 2010, ACM Trans. Graph..

[16] Jan Peters,et al. Policy Gradient Methods , 2010, Encyclopedia of Machine Learning.

[17] C. Karen Liu,et al. Stable Proportional-Derivative Controllers , 2011, IEEE Computer Graphics and Applications.

[18] Sergey Levine,et al. Continuous character control with low-dimensional embeddings , 2012, ACM Trans. Graph..

[19] Yuval Tassa,et al. Synthesis and stabilization of complex behaviors through online trajectory optimization , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[20] Zoran Popovic,et al. Discovery of complex behaviors through contact-invariant optimization , 2012, ACM Trans. Graph..

[21] Vladlen Koltun,et al. Optimizing locomotion controllers using biologically-based actuators and objectives , 2012, ACM Trans. Graph..

[22] Michiel van de Panne,et al. Flexible muscle-based locomotion for bipedal creatures , 2013, ACM Trans. Graph..

[23] Michiel van de Panne,et al. Diverse motion variations for physics-based character animation , 2013, SCA '13.

[24] Zoran Popovic,et al. Generalizing locomotion style to new animals with inverse optimal regression , 2014, ACM Trans. Graph..

[25] Sehoon Ha,et al. Iterative Training of Dynamic Skills Inspired by Human Coaching Techniques , 2014, ACM Trans. Graph..

[26] Taesoo Kwon,et al. Locomotion control for many-muscle humanoids , 2014, ACM Trans. Graph..

[27] Zoran Popovic,et al. Motion fields for interactive character locomotion , 2010, CACM.

[28] C. Karen Liu,et al. Online control of simulated humanoids using particle belief propagation , 2015, ACM Trans. Graph..

[29] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.

[30] Glen Berseth,et al. Dynamic terrain traversal skills using reinforcement learning , 2015, ACM Trans. Graph..

[31] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[32] Michiel van de Panne,et al. Task-based locomotion , 2016, ACM Trans. Graph..

[33] Pieter Abbeel,et al. Benchmarking Deep Reinforcement Learning for Continuous Control , 2016, ICML.

[34] Filip De Turck,et al. VIME: Variational Information Maximizing Exploration , 2016, NIPS.

[35] Stefano Ermon,et al. Generative Adversarial Imitation Learning , 2016, NIPS.

[36] Tom Schaul,et al. Unifying Count-Based Exploration and Intrinsic Motivation , 2016, NIPS.

[37] Libin Liu,et al. Guided Learning of Control Graphs for Physics-Based Characters , 2016, ACM Trans. Graph..

[38] Glen Berseth,et al. Terrain-adaptive locomotion skills using deep reinforcement learning , 2016, ACM Trans. Graph..

[39] Sergey Levine,et al. High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.

[40] Yuval Tassa,et al. Learning and Transfer of Modulated Locomotor Controllers , 2016, ArXiv.

[41] Taku Komura,et al. A deep learning framework for character motion synthesis and editing , 2016, ACM Trans. Graph..

[42] Filip De Turck,et al. Curiosity-driven Exploration in Deep Reinforcement Learning via Bayesian Neural Networks , 2016, ArXiv.

[43] Wojciech Zaremba,et al. OpenAI Gym , 2016, ArXiv.

[44] Michiel van de Panne,et al. Learning locomotion skills using DeepRL: does the choice of action space matter? , 2016, Symposium on Computer Animation.

[45] Jungdam Won,et al. How to train your dragon , 2017, ACM Trans. Graph..

[46] Taku Komura,et al. Phase-functioned neural networks for character control , 2017, ACM Trans. Graph..

[47] Justin Fu,et al. EX2: Exploration with Exemplar Models for Deep Reinforcement Learning , 2017, NIPS.

[48] Balaraman Ravindran,et al. EPOpt: Learning Robust Neural Network Policies Using Model Ensembles , 2016, ICLR.

[49] Glen Berseth,et al. DeepLoco: dynamic locomotion skills using hierarchical deep reinforcement learning , 2017, ACM Trans. Graph..

[50] Yuval Tassa,et al. Emergence of Locomotion Behaviours in Rich Environments , 2017, ArXiv.

[51] Yee Whye Teh,et al. Distral: Robust multitask reinforcement learning , 2017, NIPS.

[52] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.

[53] Yuval Tassa,et al. Learning human behaviors from motion capture by adversarial imitation , 2017, ArXiv.

[54] J. Hodgins,et al. Learning to Schedule Control Fragments for Physics-Based Characters Using Deep Q-Learning , 2017, ACM Trans. Graph..

[55] Sergey Levine,et al. (CAD)$^2$RL: Real Single-Image Flight without a Single Real Image , 2016, Robotics: Science and Systems.

[56] Marcin Andrychowicz,et al. Sim-to-Real Transfer of Robotic Control with Dynamics Randomization , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[57] Marcin Andrychowicz,et al. Overcoming Exploration in Reinforcement Learning with Demonstrations , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[58] Sergey Levine,et al. Learning Complex Dexterous Manipulation with Deep Reinforcement Learning and Demonstrations , 2017, Robotics: Science and Systems.