SFV

Data-driven character animation based on motion capture can produce highly naturalistic behaviors and, when combined with physics simulation, can provide for natural procedural responses to physical perturbations, environmental changes, and morphological discrepancies. Motion capture remains the most popular source of motion data, but collecting mocap data typically requires heavily instrumented environments and actors. In this paper, we propose a method that enables physically simulated characters to learn skills from videos (SFV). Our approach, based on deep pose estimation and deep reinforcement learning, allows data-driven animation to leverage the abundance of publicly available video clips from the web, such as those from YouTube. This has the potential to enable fast and easy design of character controllers simply by querying for video recordings of the desired behavior. The resulting controllers are robust to perturbations, can be adapted to new settings, can perform basic object interactions, and can be retargeted to new morphologies via reinforcement learning. We further demonstrate that our method can predict potential human motions from still images, by forward simulation of learned controllers initialized from the observed pose. Our framework is able to learn a broad range of dynamic skills, including locomotion, acrobatics, and martial arts. (Video1)

[1]  Hsi-Jian Lee,et al.  Determination of 3D human body postures from a single view , 1985, Comput. Vis. Graph. Image Process..

[2]  Jing Peng,et al.  Function Optimization using Connectionist Reinforcement Learning Algorithms , 1991 .

[3]  Jitendra Malik,et al.  Tracking people with twists and exponential maps , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[4]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[5]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[6]  Camillo J. Taylor,et al.  Reconstruction of articulated objects from point correspondences in a single uncalibrated image , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[7]  Camillo J. Taylor,et al.  Reconstruction of Articulated Objects from Point Correspondences in a Single Uncalibrated Image , 2000, Comput. Vis. Image Underst..

[8]  Sham M. Kakade,et al.  A Natural Policy Gradient , 2001, NIPS.

[9]  Marco da Silva,et al.  Interactive simulation of stylized human locomotion , 2008, ACM Trans. Graph..

[10]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[11]  Philippe Beaudoin,et al.  Robust task-based control policies for physics-based characters , 2009, ACM Trans. Graph..

[12]  David J. Fleet,et al.  Optimizing walking controllers for uncertain inputs and environments , 2010, ACM Trans. Graph..

[13]  M. van de Panne,et al.  Generalized biped walking control , 2010, ACM Trans. Graph..

[14]  Yoonsang Lee,et al.  Data-driven biped control , 2010, ACM Trans. Graph..

[15]  M. V. D. Panne,et al.  Sampling-based contact-rich motion control , 2010, ACM Trans. Graph..

[16]  Mark Everingham,et al.  Clustered Pose and Nonlinear Appearance Models for Human Pose Estimation , 2010, BMVC.

[17]  A. Karpathy,et al.  Locomotion skills for simulated quadrupeds , 2011, SIGGRAPH 2011.

[18]  Sergey Levine,et al.  Continuous character control with low-dimensional embeddings , 2012, ACM Trans. Graph..

[19]  Vladlen Koltun,et al.  Optimizing locomotion controllers using biologically-based actuators and objectives , 2012, ACM Trans. Graph..

[20]  Jessica K. Hodgins,et al.  Video-based 3D motion capture through biped control , 2012, ACM Trans. Graph..

[21]  Michiel van de Panne,et al.  Flexible muscle-based locomotion for bipedal creatures , 2013, ACM Trans. Graph..

[22]  Jonathan Tompson,et al.  Joint Training of a Convolutional Network and a Graphical Model for Human Pose Estimation , 2014, NIPS.

[23]  Christian Szegedy,et al.  DeepPose: Human Pose Estimation via Deep Neural Networks , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Zoran Popovic,et al.  Generalizing locomotion style to new animals with inverse optimal regression , 2014, ACM Trans. Graph..

[25]  Taesoo Kwon,et al.  Locomotion control for many-muscle humanoids , 2014, ACM Trans. Graph..

[26]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[27]  Cristian Sminchisescu,et al.  Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Zoran Popovic,et al.  Motion fields for interactive character locomotion , 2010, CACM.

[29]  Michael J. Black,et al.  SMPL: A Skinned Multi-Person Linear Model , 2015, ACM Trans. Graph..

[30]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[31]  Glen Berseth,et al.  Dynamic terrain traversal skills using reinforcement learning , 2015, ACM Trans. Graph..

[32]  Pieter Abbeel,et al.  Benchmarking Deep Reinforcement Learning for Continuous Control , 2016, ICML.

[33]  Xiaowei Zhou,et al.  Sparseness Meets Deepness: 3D Human Pose Estimation from Monocular Video , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Peter V. Gehler,et al.  Keep It SMPL: Automatic Estimation of 3D Human Pose and Shape from a Single Image , 2016, ECCV.

[35]  Jia Deng,et al.  Stacked Hourglass Networks for Human Pose Estimation , 2016, ECCV.

[36]  Varun Ramakrishna,et al.  Convolutional Pose Machines , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Libin Liu,et al.  Guided Learning of Control Graphs for Physics-Based Characters , 2016, ACM Trans. Graph..

[38]  Glen Berseth,et al.  Terrain-adaptive locomotion skills using deep reinforcement learning , 2016, ACM Trans. Graph..

[39]  Wei Zhang,et al.  Deep Kinematic Pose Regression , 2016, ECCV Workshops.

[40]  Sergey Levine,et al.  High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.

[41]  Cordelia Schmid,et al.  MoCap-guided Data Augmentation for 3D Pose Estimation in the Wild , 2016, NIPS.

[42]  Taku Komura,et al.  A deep learning framework for character motion synthesis and editing , 2016, ACM Trans. Graph..

[43]  Vincent Lepetit,et al.  Direct Prediction of 3D Body Poses from Motion Compensated Sequences , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Wojciech Zaremba,et al.  OpenAI Gym , 2016, ArXiv.

[45]  Xiaowei Zhou,et al.  Sparse Representation for 3D Shape Estimation: A Convex Relaxation Approach , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[46]  Hans-Peter Seidel,et al.  VNect: Real-time 3D Human Pose Estimation with a Single RGB Camera , 2017, ACM Trans. Graph..

[47]  Jungdam Won,et al.  How to train your dragon , 2017, ACM Trans. Graph..

[48]  Taku Komura,et al.  Phase-functioned neural networks for character control , 2017, ACM Trans. Graph..

[49]  Yichen Wei,et al.  Weakly-supervised Transfer for 3D Human Pose Estimation in the Wild , 2017, ArXiv.

[50]  Sergey Levine,et al.  Time-Contrastive Networks: Self-Supervised Learning from Multi-view Observation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[51]  Glen Berseth,et al.  DeepLoco: dynamic locomotion skills using hierarchical deep reinforcement learning , 2017, ACM Trans. Graph..

[52]  Yaser Sheikh,et al.  OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[53]  Xiaowei Zhou,et al.  Coarse-to-Fine Volumetric Prediction for Single-Image 3D Human Pose , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[54]  Ersin Yumer,et al.  Self-supervised Learning of Motion Capture , 2017, NIPS.

[55]  Yee Whye Teh,et al.  Distral: Robust multitask reinforcement learning , 2017, NIPS.

[56]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[57]  Yuval Tassa,et al.  Learning human behaviors from motion capture by adversarial imitation , 2017, ArXiv.

[58]  J. Hodgins,et al.  Learning to Schedule Control Fragments for Physics-Based Characters Using Deep Q-Learning , 2017, ACM Trans. Graph..

[59]  Sergey Levine,et al.  DeepMimic , 2018, ACM Trans. Graph..

[60]  Sergey Levine,et al.  Time-Contrastive Networks: Self-Supervised Learning from Video , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[61]  Christian Theobalt,et al.  MonoPerfCap , 2017, ACM Trans. Graph..

[62]  C. Karen Liu,et al.  Learning symmetric and low-energy locomotion , 2018, ACM Trans. Graph..

[63]  Sergey Levine,et al.  Learning Complex Dexterous Manipulation with Deep Reinforcement Learning and Demonstrations , 2017, Robotics: Science and Systems.

[64]  Jitendra Malik,et al.  End-to-End Recovery of Human Shape and Pose , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[65]  Sergey Levine,et al.  One-Shot Imitation from Observing Humans via Domain-Adaptive Meta-Learning , 2018, Robotics: Science and Systems.