Learning character-agnostic motion for motion retargeting in 2D

Analyzing human motion is a challenging task with a wide variety of applications in computer vision and in graphics. One such application, of particular importance in computer animation, is the retargeting of motion from one performer to another. While humans move in three dimensions, the vast majority of human motions are captured using video, requiring 2D-to-3D pose and camera recovery, before existing retargeting approaches may be applied. In this paper, we present a new method for retargeting video-captured motion between different human performers, without the need to explicitly reconstruct 3D poses and/or camera parameters. In order to achieve our goal, we learn to extract, directly from a video, a high-level latent motion representation, which is invariant to the skeleton geometry and the camera view. Our key idea is to train a deep neural network to decompose temporal sequences of 2D poses into three components: motion, skeleton, and camera view-angle. Having extracted such a representation, we are able to re-combine motion with novel skeletons and camera views, and decode a retargeted temporal sequence, which we compare to a ground truth from a synthetic dataset. We demonstrate that our framework can be used to robustly extract human motion from videos, bypassing 3D reconstruction, and outperforming existing retargeting methods, when applied to videos in-the-wild. It also enables additional applications, such as performance cloning, video-driven cartoons, and motion retrieval.

[1]  Kwang-Jin Choi,et al.  Online motion retargetting , 2000, Comput. Animat. Virtual Worlds.

[2]  Taku Komura,et al.  A Deep Learning Framework for Character Motion Synthesis and Editing , 2016, ACM Trans. Graph..

[3]  Alexei A. Efros,et al.  Everybody Dance Now , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[4]  Kari Pulli,et al.  Style translation for human motion , 2005, SIGGRAPH 2005.

[5]  Zhengxing Sun,et al.  Scalable Organization of Collections of Motion Capture Data via Quantitative and Qualitative Analysis , 2015, ICMR.

[6]  Yaser Sheikh,et al.  Hand Keypoint Detection in Single Images Using Multiview Bootstrapping , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Daniel Cohen-Or,et al.  Deep motifs and motion signatures , 2018, ACM Trans. Graph..

[8]  Hyeong-Seok Ko,et al.  A physically-based motion retargeting filter , 2005, TOGS.

[9]  Wenping Wang,et al.  Neural Animation and Reenactment of Human Actor Videos , 2018, ArXiv.

[10]  Jitendra Malik,et al.  SFV , 2018, ACM Trans. Graph..

[11]  Taku Komura,et al.  Learning motion manifolds with convolutional autoencoders , 2015, SIGGRAPH Asia Technical Briefs.

[12]  Joshua B. Tenenbaum,et al.  Deep Convolutional Inverse Graphics Network , 2015, NIPS.

[13]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[14]  Yaser Sheikh,et al.  OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Zhaoqi Wang,et al.  Indexing and retrieval of human motion data by a hierarchical tree , 2009, VRST '09.

[16]  Tobias Schreck,et al.  MotionExplorer: Exploratory Search in Human Motion Capture Data Based on Hierarchical Aggregation , 2013, IEEE Transactions on Visualization and Computer Graphics.

[17]  Jinxiang Chai,et al.  Synthesis and editing of personalized stylistic human motion , 2010, I3D '10.

[18]  Dani Lischinski,et al.  Deep Video‐Based Performance Cloning , 2018, Comput. Graph. Forum.

[19]  Lee Montgomery Tradigital Maya: A CG Animator's Guide to Applying the Classical Principles of Animation , 2011 .

[20]  Jessica K. Hodgins,et al.  Realtime style transfer for unlabeled heterogeneous human motion , 2015, ACM Trans. Graph..

[21]  Jitendra Malik,et al.  End-to-End Recovery of Human Shape and Pose , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[22]  Mubarak Shah,et al.  UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild , 2012, ArXiv.

[23]  Wei Chen,et al.  Motion track: Visualizing variations of human motion data , 2010, 2010 IEEE Pacific Visualization Symposium (PacificVis).

[24]  Weiyu Zhang,et al.  From Actemes to Action: A Strongly-Supervised Representation for Detailed Action Understanding , 2013, 2013 IEEE International Conference on Computer Vision.

[25]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[26]  Sanjiv Kumar,et al.  On the Convergence of Adam and Beyond , 2018 .

[27]  Peter-Pike J. Sloan,et al.  Artist‐Directed Inverse‐Kinematics Using Radial Basis Function Interpolation , 2001, Comput. Graph. Forum.

[28]  Vincent Dumoulin,et al.  Deconvolution and Checkerboard Artifacts , 2016 .

[29]  Luc Van Gool,et al.  Disentangled Person Image Generation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[30]  Christian Theobalt,et al.  Neural Rendering and Reenactment of Human Actor Videos , 2018, ACM Trans. Graph..

[31]  Hans-Peter Seidel,et al.  Efficient and Robust Annotation of Motion Capture Data , 2009 .

[32]  Michael Gleicher,et al.  Retargetting motion to new characters , 1998, SIGGRAPH.

[33]  Dimitris N. Metaxas,et al.  Reconstruction-Based Disentanglement for Pose-Invariant Face Recognition , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[34]  Ruben Villegas,et al.  Neural Kinematic Networks for Unsupervised Motion Retargetting , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[35]  Dieter W. Fellner,et al.  Visual-Interactive Semi-Supervised Labeling of Human Motion Capture Data , 2017, Visualization and Data Analysis.

[36]  Jan Kautz,et al.  MoCoGAN: Decomposing Motion and Content for Video Generation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[37]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[38]  Hans-Peter Seidel,et al.  VNect , 2017, ACM Trans. Graph..

[39]  Michael Gleicher,et al.  Automated extraction and parameterization of motions in large data sets , 2004, SIGGRAPH 2004.

[40]  Sung Yong Shin,et al.  A hierarchical approach to interactive motion editing for human-like figures , 1999, SIGGRAPH.