A Recurrent Variational Autoencoder for Human Motion Synthesis

We propose a novel generative model of human motion that can be trained using a large motion capture dataset, and allows users to produce animations from high-level control signals. As previous architectures struggle to predict motions far into the future due to the inherent ambiguity, we argue that a user-provided control signal is desirable for animators and greatly reduces the predictive error for long sequences. Thus, we formulate a framework which explicitly introduces an encoding of control signals into a variational inference framework trained to learn the manifold of human motion. As part of this framework, we formulate a prior on the latent space, which allows us to generate high-quality motion without providing frames from an existing sequence. We further model the sequential nature of the task by combining samples from a variational approximation to the intractable posterior with the control signal through a recurrent neural network (RNN) that synthesizes the motion. We show that our system can predict the movements of the human body over long horizons more accurately than state-of-theart methods. Finally, the design of our system considers practical use cases and thus provides a competitive approach to motion synthesis.

[1]  Weidi Xu,et al.  Semi-supervised Variational Autoencoders for Sequence Classification , 2016, ArXiv.

[2]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[3]  Samy Bengio,et al.  Generating Sentences from a Continuous Space , 2015, CoNLL.

[4]  Geoffrey E. Hinton,et al.  Factored conditional restricted Boltzmann Machines for modeling motion style , 2009, ICML '09.

[5]  Yong Du,et al.  Hierarchical recurrent neural network for skeleton based action recognition , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Taku Komura,et al.  Learning motion manifolds with convolutional autoencoders , 2015, SIGGRAPH Asia Technical Briefs.

[7]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[8]  Silvio Savarese,et al.  Structural-RNN: Deep Learning on Spatio-Temporal Graphs , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[10]  Charles A. Sutton,et al.  A Convolutional Attention Network for Extreme Summarization of Source Code , 2016, ICML.

[11]  Taku Komura,et al.  A Deep Learning Framework for Character Motion Synthesis and Editing , 2016, ACM Trans. Graph..

[12]  Sepp Hochreiter,et al.  Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) , 2015, ICLR.

[13]  Alex Graves,et al.  DRAW: A Recurrent Neural Network For Image Generation , 2015, ICML.

[14]  Geoffrey E. Hinton,et al.  Two Distributed-State Models For Generating High-Dimensional Time Series , 2011, J. Mach. Learn. Res..

[15]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[16]  Jitendra Malik,et al.  Recurrent Network Models for Human Dynamics , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[17]  Silvio Savarese,et al.  Structured Recurrent Temporal Restricted Boltzmann Machines , 2014, ICML.

[18]  Danica Kragic,et al.  Deep Representation Learning for Human Motion Prediction and Classification , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Geoffrey E. Hinton,et al.  Modeling Human Motion Using Binary Latent Variables , 2006, NIPS.