Combining Recurrent Neural Networks and Adversarial Training for Human Motion Synthesis and Control

This paper introduces a new generative deep learning network for human motion synthesis and control. Our key idea is to combine recurrent neural networks (RNNs) and adversarial training for human motion modeling. We first describe an efficient method for training an RNN model from prerecorded motion data. We implement RNNs with long short-term memory (LSTM) cells because they are capable of addressing the nonlinear dynamics and long term temporal dependencies present in human motions. Next, we train a refiner network using an adversarial loss, similar to generative adversarial networks (GANs), such that refined motion sequences are indistinguishable from real mocap data using a discriminative network. The resulting model is appealing for motion synthesis and control because it is compact, contact-aware, and can generate an infinite number of naturally looking motions with infinite lengths. Our experiments show that motions generated by our deep learning model are always highly realistic and comparable to high-quality motion capture data. We demonstrate the power and effectiveness of our models by exploring a variety of applications, ranging from random motion synthesis, online/offline motion control, and motion filtering. We show the superiority of our generative model by comparison against baseline models.

[1]  Jessica K. Hodgins,et al.  Realtime style transfer for unlabeled heterogeneous human motion , 2015, ACM Trans. Graph..

[2]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[3]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[4]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[5]  Alex Graves,et al.  DRAW: A Recurrent Neural Network For Image Generation , 2015, ICML.

[6]  Geoffrey E. Hinton,et al.  Factored conditional restricted Boltzmann Machines for modeling motion style , 2009, ICML '09.

[7]  Jehee Lee,et al.  Interactive character animation by learning multi-objective control , 2018, ACM Trans. Graph..

[8]  Alexei A. Efros,et al.  Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[10]  Jessica K. Hodgins,et al.  Constraint-based motion optimization using a statistical dynamic model , 2007, ACM Trans. Graph..

[11]  Yen-Lin Chen,et al.  Interactive generation of human animation with deformable motion models , 2009, TOGS.

[12]  Leon A. Gatys,et al.  A Neural Algorithm of Artistic Style , 2015, ArXiv.

[13]  Taku Komura,et al.  Phase-functioned neural networks for character control , 2017, ACM Trans. Graph..

[14]  Alex Graves,et al.  Generating Sequences With Recurrent Neural Networks , 2013, ArXiv.

[15]  Silvio Savarese,et al.  Structural-RNN: Deep Learning on Spatio-Temporal Graphs , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Taku Komura,et al.  A Deep Learning Framework for Character Motion Synthesis and Editing , 2016, ACM Trans. Graph..

[17]  Michael J. Black,et al.  On Human Motion Prediction Using Recurrent Neural Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Jitendra Malik,et al.  Recurrent Network Models for Human Dynamics , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[19]  Ziv Bar-Joseph,et al.  Modeling spatial and temporal variation in motion data , 2009, ACM Trans. Graph..

[20]  Jinxiang Chai,et al.  Physically valid statistical models for human motion generation , 2011, TOGS.

[21]  Simon Osindero,et al.  Conditional Generative Adversarial Nets , 2014, ArXiv.

[22]  Tomas Pfister,et al.  Learning from Simulated and Unsupervised Images through Adversarial Training , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Aaron Hertzmann,et al.  Style machines , 2000, SIGGRAPH 2000.

[24]  Jorge Nocedal,et al.  On the limited memory BFGS method for large scale optimization , 1989, Math. Program..

[25]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[26]  Christian Ledig,et al.  Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Taku Komura,et al.  A Recurrent Variational Autoencoder for Human Motion Synthesis , 2017, BMVC.

[28]  Silvio Savarese,et al.  3D-R2N2: A Unified Approach for Single and Multi-view 3D Object Reconstruction , 2016, ECCV.

[29]  Jinxiang Chai,et al.  Motion graphs++ , 2012, ACM Trans. Graph..

[30]  Alex Graves,et al.  Sequence Transduction with Recurrent Neural Networks , 2012, ArXiv.

[31]  Yi Zhou,et al.  Auto-Conditioned Recurrent Networks for Extended Complex Human Motion Synthesis , 2017, ICLR.

[32]  David Berthelot,et al.  BEGAN: Boundary Equilibrium Generative Adversarial Networks , 2017, ArXiv.

[33]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[34]  Riccardo Poli,et al.  Particle swarm optimization , 1995, Swarm Intelligence.

[35]  PAUL J. WERBOS,et al.  Generalization of backpropagation with application to a recurrent gas market model , 1988, Neural Networks.

[36]  Kuldip K. Paliwal,et al.  Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..