Efficient Neural Networks for Real-time Motion Style Transfer

Style is an intrinsic, inescapable part of human motion. It complements the content of motion to convey meaning, mood, and personality. Existing state-of-the-art motion style methods require large quantities of example data and intensive computational resources at runtime. To ensure output quality, such style transfer applications are often run on desktop machine with GPUs and significant memory. In this paper, we present a fast and expressive neural network-based motion style transfer method that generates stylized motion with quality comparable to the state of the art method, but uses much less computational power and a much smaller memory footprint. Our method also allows the output to be adjusted in a latent style space, something not offered in previous approaches. Our style transfer model is implemented using three multi-layered networks: a pose network, a timing network and a foot-contact network. A one-hot style vector serves as an input control knob and determines the stylistic output of these networks. During training, the networks are trained with a large motion capture database containing heterogeneous actions and various styles. Joint information vectors together with one-hot style vectors are extracted from motion data and fed to the networks. Once the network has been trained, the database is no longer needed on the device, thus removing the large memory requirement of previous motion style methods. At runtime, our model takes novel input and allows real-valued numbers to be specified in the style vector, which can be used for interpolation, extrapolation or mixing of styles. With much lower memory and computational requirements, our networks are efficient and fast enough for real-time use on mobile devices. Requiring no information about future states, the style transfer can be performed in an online fashion. We validate our result both quantitatively and perceptually, confirming its effectiveness and improvement over previous approaches.

[1]  Sophie Jörg,et al.  Evaluating the emotional content of human motions on real and virtual characters , 2008, APGV '08.

[2]  Lucas Kovar,et al.  Flexible automatic motion blending with registration curves , 2003, SCA '03.

[3]  Libin Liu,et al.  Learning basketball dribbling skills using trajectory optimization and deep reinforcement learning , 2018, ACM Trans. Graph..

[4]  Taku Komura,et al.  A Deep Learning Framework for Character Motion Synthesis and Editing , 2016, ACM Trans. Graph..

[5]  Kari Pulli,et al.  Style translation for human motion , 2005, SIGGRAPH 2005.

[6]  Michael Neff,et al.  Deep signatures for indexing and retrieval in large motion databases , 2015, MIG.

[7]  Dario Pavllo,et al.  QuaterNet: A Quaternion-based Recurrent Model for Human Motion , 2018, BMVC.

[8]  Sebastian Raschka,et al.  Model Evaluation, Model Selection, and Algorithm Selection in Machine Learning , 2018, ArXiv.

[9]  Q. Mcnemar Note on the sampling error of the difference between correlated proportions or percentages , 1947, Psychometrika.

[10]  Taku Komura,et al.  Mode-adaptive neural networks for quadruped motion control , 2018, ACM Trans. Graph..

[11]  Yi Zhou,et al.  Auto-Conditioned LSTM Network for Extended Complex Human Motion Synthesis , 2017, ArXiv.

[12]  Taku Komura,et al.  Fast Neural Style Transfer for Motion Data , 2017, IEEE Computer Graphics and Applications.

[13]  Sergey Levine,et al.  DeepMimic , 2018, ACM Trans. Graph..

[14]  Aaron Hertzmann,et al.  Style machines , 2000, SIGGRAPH 2000.

[15]  Ludovic Hoyet,et al.  Evaluating the distinctiveness and attractiveness of human motions on realistic virtual bodies , 2013, ACM Trans. Graph..

[16]  Michael Neff,et al.  Understanding the impact of animated gesture performance on personality perceptions , 2017, ACM Trans. Graph..

[17]  Geoffrey E. Hinton,et al.  Factored conditional restricted Boltzmann Machines for modeling motion style , 2009, ICML '09.

[18]  Kenji Amaya,et al.  Emotion from Motion , 1996, Graphics Interface.

[19]  Taku Komura,et al.  Learning motion manifolds with convolutional autoencoders , 2015, SIGGRAPH Asia Technical Briefs.

[20]  Glen Berseth,et al.  DeepLoco , 2017, ACM Trans. Graph..

[21]  Ruben Villegas,et al.  Neural Kinematic Networks for Unsupervised Motion Retargetting , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[22]  Taku Komura,et al.  Few‐shot Learning of Homogeneous Human Locomotion Styles , 2018, Comput. Graph. Forum.

[23]  Evie McCrum-Gardner,et al.  Which is the correct statistical test to use? , 2008, The British journal of oral & maxillofacial surgery.

[24]  Michael Neff,et al.  Evaluating the Effect of Gesture and Language on Personality Perception in Conversational Agents , 2010, IVA.

[25]  Michael Neff,et al.  Component-based locomotion composition , 2012, SCA '12.

[26]  Taku Komura,et al.  Phase-functioned neural networks for character control , 2017, ACM Trans. Graph..

[27]  Ken-ichi Anjyo,et al.  Fourier principles for emotion-based human figure animation , 1995, SIGGRAPH.

[28]  K. Fischer,et al.  Stylistic Locomotion Modeling with Conditional Variational Autoencoder , 2019, Eurographics.

[29]  Michael Neff,et al.  Assessing the Impact of Hand Motion on Virtual Character Personality , 2016, TAP.

[30]  Niloy J. Mitra,et al.  Spectral style transfer for human motion between independent actions , 2016, ACM Trans. Graph..

[31]  Jessica K. Hodgins,et al.  Realtime style transfer for unlabeled heterogeneous human motion , 2015, ACM Trans. Graph..

[32]  Daniel Cohen-Or,et al.  Emotion control of unstructured dance movements , 2017, Symposium on Computer Animation.

[33]  Joaquim B. Cavalcante Neto,et al.  Using natural vibrations to guide control for locomotion , 2012, I3D '12.