Auto-Conditioned Recurrent Networks for Extended Complex Human Motion Synthesis

We present a real-time method for synthesizing highly complex human motions using a novel training regime we call the auto-conditioned Recurrent Neural Network (acRNN). Recently, researchers have attempted to synthesize new motion by using autoregressive techniques, but existing methods tend to freeze or diverge after a couple of seconds due to an accumulation of errors that are fed back into the network. Furthermore, such methods have only been shown to be reliable for relatively simple human motions, such as walking or running. In contrast, our approach can synthesize arbitrary motions with highly complex styles, including dances or martial arts in addition to locomotion. The acRNN is able to accomplish this by explicitly accommodating for autoregressive noise accumulation during training. Our work is the first to our knowledge that demonstrates the ability to generate over 18,000 continuous frames (300 seconds) of new complex human motion w.r.t. different styles.

[1]  Michael J. Black,et al.  On Human Motion Prediction Using Recurrent Neural Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  C. Karen Liu,et al.  Learning physics-based motion style with nonlinear inverse optimization , 2005, ACM Trans. Graph..

[3]  Samy Bengio,et al.  Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks , 2015, NIPS.

[4]  Jessica K. Hodgins,et al.  Synthesizing physically realistic human motion in low-dimensional, behavior-specific spaces , 2004, SIGGRAPH 2004.

[5]  Taku Komura,et al.  Phase-functioned neural networks for character control , 2017, ACM Trans. Graph..

[6]  Sergey Levine,et al.  Physically plausible simulation for character animation , 2012, SCA '12.

[7]  Sung Yong Shin,et al.  On-line locomotion generation based on motion blending , 2002, SCA '02.

[8]  Adrian Hilton,et al.  A survey of advances in vision-based human motion capture and analysis , 2006, Comput. Vis. Image Underst..

[9]  Taku Komura,et al.  A Deep Learning Framework for Character Motion Synthesis and Editing , 2016, ACM Trans. Graph..

[10]  Danica Kragic,et al.  Deep Representation Learning for Human Motion Prediction and Classification , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  David J. Fleet,et al.  Erratum: "Gaussian process dynamical models for human motion" (IEEE Transactions on Pattern analysis and Machine Intelligenc (292)) , 2008 .

[12]  Jitendra Malik,et al.  Recurrent Network Models for Human Dynamics , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[13]  Tomohiko Mukai,et al.  Motion rings for interactive gait synthesis , 2011, SI3D.

[14]  Samy Bengio,et al.  Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Geoffrey J. Gordon,et al.  A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[16]  Glen Berseth,et al.  DeepLoco , 2017, ACM Trans. Graph..

[17]  Alex Graves,et al.  Generating Sequences With Recurrent Neural Networks , 2013, ArXiv.

[18]  Ronald J. Williams,et al.  Gradient-based learning algorithms for recurrent networks and their computational complexity , 1995 .

[19]  Zoran Popović,et al.  Motion fields for interactive character locomotion , 2010, SIGGRAPH 2010.

[20]  Jessica K. Hodgins,et al.  Realtime style transfer for unlabeled heterogeneous human motion , 2015, ACM Trans. Graph..

[21]  David J. Fleet,et al.  Gaussian Process Dynamical Models , 2005, NIPS.

[22]  Libin Liu,et al.  Guided Learning of Control Graphs for Physics-Based Characters , 2016, ACM Trans. Graph..

[23]  Sergey Levine,et al.  Continuous character control with low-dimensional embeddings , 2012, ACM Trans. Graph..

[24]  Nicolas Pronost,et al.  Interactive Character Animation Using Simulated Physics: A State‐of‐the‐Art Review , 2012, Comput. Graph. Forum.

[25]  Alex Graves,et al.  DRAW: A Recurrent Neural Network For Image Generation , 2015, ICML.

[26]  Geoffrey E. Hinton,et al.  Generating Text with Recurrent Neural Networks , 2011, ICML.

[27]  Sehoon Ha,et al.  Falling and landing motion control for character animation , 2012, ACM Trans. Graph..

[28]  C. Karen Liu,et al.  Online control of simulated humanoids using particle belief propagation , 2015, ACM Trans. Graph..

[29]  C. Karen Liu,et al.  Animating human dressing , 2015, ACM Trans. Graph..

[30]  David J. Fleet,et al.  This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE Gaussian Process Dynamical Model , 2007 .

[31]  Yuval Tassa,et al.  Learning human behaviors from motion capture by adversarial imitation , 2017, ArXiv.

[32]  Yoshua Bengio,et al.  Professor Forcing: A New Algorithm for Training Recurrent Networks , 2016, NIPS.

[33]  Lucas Kovar,et al.  Motion graphs , 2002, SIGGRAPH Classes.

[34]  Geoffrey E. Hinton,et al.  Factored conditional restricted Boltzmann Machines for modeling motion style , 2009, ICML '09.

[35]  Jessica K. Hodgins,et al.  Performance animation from low-dimensional control signals , 2005, SIGGRAPH 2005.

[36]  B. Schölkopf,et al.  Modeling Human Motion Using Binary Latent Variables , 2007 .

[37]  Jinxiang Chai,et al.  Motion graphs++ , 2012, ACM Trans. Graph..

[38]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[39]  Jessica K. Hodgins,et al.  Construction and optimal search of interpolated motion graphs , 2007, ACM Trans. Graph..

[40]  Silvio Savarese,et al.  Structural-RNN: Deep Learning on Spatio-Temporal Graphs , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Taku Komura,et al.  Learning motion manifolds with convolutional autoencoders , 2015, SIGGRAPH Asia Technical Briefs.

[42]  Aaron Hertzmann,et al.  Style-based inverse kinematics , 2004, ACM Trans. Graph..

[43]  Hans-Peter Seidel,et al.  Motion reconstruction using sparse accelerometer data , 2011, TOGS.

[44]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.