Neural Kinematic Networks for Unsupervised Motion Retargetting

We propose a recurrent neural network architecture with a Forward Kinematics layer and cycle consistency based adversarial training objective for unsupervised motion retargetting. Our network captures the high-level properties of an input motion by the forward kinematics layer, and adapts them to a target character with different skeleton bone lengths (e.g., shorter, longer arms etc.). Collecting paired motion training sequences from different characters is expensive. Instead, our network utilizes cycle consistency to learn to solve the Inverse Kinematics problem in an unsupervised manner. Our method works online, i.e., it adapts the motion sequence on-the-fly as new frames are received. In our experiments, we use the Mixamo animation data1 to test our method for a variety of motions and characters and achieve state-of-the-art results. We also demonstrate motion retargetting from monocular human videos to 3D characters using an off-the-shelf 3D pose estimator.

[1]  Hans-Peter Seidel,et al.  VNect: Real-time 3D Human Pose Estimation with a Single RGB Camera , 2017, ACM Trans. Graph..

[2]  Niloy J. Mitra,et al.  Spectral style transfer for human motion between independent actions , 2016, ACM Trans. Graph..

[3]  Jinxiang Chai,et al.  Synthesis and editing of personalized stylistic human motion , 2010, I3D '10.

[4]  Alexei A. Efros,et al.  Learning Dense Correspondence via 3D-Guided Cycle Consistency , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  James J. Little,et al.  A Simple Yet Effective Baseline for 3d Human Pose Estimation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[6]  Rajesh P. N. Rao,et al.  Learning Shared Latent Structure for Image Synthesis and Robotic Imitation , 2005, NIPS.

[7]  Alexei A. Efros,et al.  Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[8]  Honglak Lee,et al.  Perspective Transformer Nets: Learning Single-View 3D Object Reconstruction without 3D Supervision , 2016, NIPS.

[9]  Jitendra Malik,et al.  Recurrent Network Models for Human Dynamics , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[10]  Yi Zhou,et al.  Auto-Conditioned LSTM Network for Extended Complex Human Motion Synthesis , 2017, ArXiv.

[11]  J A Bagnell,et al.  An Invitation to Imitation , 2015 .

[12]  Cristian Sminchisescu,et al.  Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Martin Wattenberg,et al.  Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation , 2016, TACL.

[14]  Taku Komura,et al.  A deep learning framework for character motion synthesis and editing , 2016, ACM Trans. Graph..

[15]  David J. Fleet,et al.  This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE Gaussian Process Dynamical Model , 2007 .

[16]  Stefano Ermon,et al.  Generative Adversarial Imitation Learning , 2016, NIPS.

[17]  拓海 杉山,et al.  “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告 , 2017 .

[18]  Yuval Tassa,et al.  Learning human behaviors from motion capture by adversarial imitation , 2017, ArXiv.

[19]  Max Jaderberg,et al.  Unsupervised Learning of 3D Structure from Images , 2016, NIPS.

[20]  Ersin Yumer,et al.  Material Editing Using a Physically Based Rendering Network , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[21]  Lucas Kovar,et al.  Automated extraction and parameterization of motions in large data sets , 2004, ACM Trans. Graph..

[22]  Joshua B. Tenenbaum,et al.  A Compositional Object-Based Approach to Learning Physical Dynamics , 2016, ICLR.

[23]  Hyeong-Seok Ko,et al.  A physically-based motion retargeting filter , 2005, TOGS.

[24]  Sung Yong Shin,et al.  A hierarchical approach to interactive motion editing for human-like figures , 1999, SIGGRAPH.

[25]  Eiichi Yoshida,et al.  Motion Retargeting for Humanoid Robots Based on Simultaneous Morphing Parameter Identification and Motion Optimization , 2017, IEEE Transactions on Robotics.

[26]  JeheeLee SungYongShin A Hierarchical Approach to Interactive Motion Editing for Human-like Figures , 1999 .

[27]  Dani Lischinski,et al.  Learning character-agnostic motion for motion retargeting in 2D , 2019, ACM Trans. Graph..

[28]  Jovan Popovic,et al.  Style translation for human motion , 2005, ACM Trans. Graph..

[29]  Danica Kragic,et al.  Deep Representation Learning for Human Motion Prediction and Classification , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  David J. Fleet,et al.  Erratum: "Gaussian process dynamical models for human motion" (IEEE Transactions on Pattern analysis and Machine Intelligenc (292)) , 2008 .

[31]  Geoffrey E. Hinton,et al.  Modeling Human Motion Using Binary Latent Variables , 2006, NIPS.

[32]  David E. Orin,et al.  Kinodynamically Consistent Motion Retargeting for Humanoids , 2015, Int. J. Humanoid Robotics.

[33]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[34]  Jessica K. Hodgins,et al.  Realtime style transfer for unlabeled heterogeneous human motion , 2015, ACM Trans. Graph..

[35]  Peter-Pike J. Sloan,et al.  Artist‐Directed Inverse‐Kinematics Using Radial Basis Function Interpolation , 2001, Comput. Graph. Forum.

[36]  Aaron Hertzmann,et al.  Style machines , 2000, SIGGRAPH 2000.

[37]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[38]  Sergey Levine,et al.  Time-Contrastive Networks: Self-Supervised Learning from Video , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[39]  Stefan Schaal,et al.  Is imitation learning the route to humanoid robots? , 1999, Trends in Cognitive Sciences.

[40]  Kwang-Jin Choi,et al.  On-line motion retargetting , 1999, Proceedings. Seventh Pacific Conference on Computer Graphics and Applications (Cat. No.PR00293).

[41]  Michael J. Black,et al.  On Human Motion Prediction Using Recurrent Neural Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Silvio Savarese,et al.  Structural-RNN: Deep Learning on Spatio-Temporal Graphs , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Michael Gleicher,et al.  Retargetting motion to new characters , 1998, SIGGRAPH.

[44]  Jiajun Wu,et al.  Learning to See Physics via Visual De-animation , 2017, NIPS.

[45]  Aaron Hertzmann,et al.  Style-based inverse kinematics , 2004, ACM Trans. Graph..