Diverse Motion Stylization for Multiple Style Domains via Spatial-Temporal Graph-Based Generative Model

This paper presents a novel deep learning-based framework for translating a motion into various styles within multiple domains. Our framework is a single set of generative adversarial networks that learns stylistic features from a collection of unpaired motion clips with style labels to support mapping between multiple style domains. We construct a spatio-temporal graph to model a motion sequence and employ the spatial-temporal graph convolution networks (ST-GCN) to extract stylistic properties along spatial and temporal dimensions. Through spatial-temporal modeling, our framework shows improved style translation results between significantly different actions and on a long motion sequence containing multiple actions. In addition, we first develop a mapping network for motion stylization that maps a random noise to style, which allows for generating diverse stylization results without using reference motions. Through various experiments, we demonstrate the ability of our method to generate improved results in terms of visual quality, stylistic diversity, and content preservation.

[1]  Jongin Lim,et al.  PMnet: Learning of Disentangled Pose and Movement for Unsupervised Motion Retargeting , 2019, BMVC.

[2]  Youngjung Uh,et al.  Rethinking the Truly Unsupervised Image-to-Image Translation , 2020, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[3]  Sepp Hochreiter,et al.  GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.

[4]  Jung-Woo Ha,et al.  StarGAN v2: Diverse Image Synthesis for Multiple Domains , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Jovan Popović,et al.  Style translation for human motion , 2005, ACM Trans. Graph..

[6]  Jaakko Lehtinen,et al.  Few-Shot Unsupervised Image-to-Image Translation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[7]  Harshad Rai,et al.  Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks , 2018 .

[8]  Andrea Vedaldi,et al.  Improved Texture Networks: Maximizing Quality and Diversity in Feed-Forward Stylization and Texture Synthesis , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Niloy J. Mitra,et al.  Spectral style transfer for human motion between independent actions , 2016, ACM Trans. Graph..

[10]  Taku Komura,et al.  Fast Neural Style Transfer for Motion Data , 2017, IEEE Computer Graphics and Applications.

[11]  Yingying Wang,et al.  Efficient Neural Networks for Real-time Motion Style Transfer , 2019, PACMCGIT.

[12]  Geoffrey E. Hinton,et al.  Factored conditional restricted Boltzmann Machines for modeling motion style , 2009, ICML '09.

[13]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Fuxin Li,et al.  Adaptive Wing Loss for Robust Face Alignment via Heatmap Regression , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[15]  Serge J. Belongie,et al.  Arbitrary Style Transfer in Real-Time with Adaptive Instance Normalization , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[16]  Jessica K. Hodgins,et al.  Realtime style transfer for unlabeled heterogeneous human motion , 2015, ACM Trans. Graph..

[17]  Alexei A. Efros,et al.  Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[19]  Dahua Lin,et al.  Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition , 2018, AAAI.

[20]  Seunghoon Hong,et al.  Diversity-Sensitive Conditional Generative Adversarial Networks , 2019, ICLR.

[21]  Yu-Ding Lu,et al.  DRIT++: Diverse Image-to-Image Translation via Disentangled Representations , 2020, International Journal of Computer Vision.

[22]  Dani Lischinski,et al.  Unpaired motion style transfer from video to animation , 2020, ACM Trans. Graph..

[23]  Taku Komura,et al.  A Deep Learning Framework for Character Motion Synthesis and Editing , 2016, ACM Trans. Graph..

[24]  Edmond S. L. Ho,et al.  Emotion Transfer for 3D Hand and Full Body Motion Using StarGAN , 2021, Comput..

[25]  Jan Kautz,et al.  Multimodal Unsupervised Image-to-Image Translation , 2018, ECCV.

[26]  Jeff Donahue,et al.  Large Scale Adversarial Representation Learning , 2019, NeurIPS.

[27]  Daniel Cohen-Or,et al.  Emotion control of unstructured dance movements , 2017, Symposium on Computer Animation.

[28]  Philipp Slusallek,et al.  Stylistic Locomotion Modeling and Synthesis using Variational Generative Models , 2019, MIG.

[29]  Leon A. Gatys,et al.  Image Style Transfer Using Convolutional Neural Networks , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Maneesh Kumar Singh,et al.  DRIT++: Diverse Image-to-Image Translation via Disentangled Representations , 2019, International Journal of Computer Vision.

[31]  Jinxiang Chai,et al.  Synthesis and editing of personalized stylistic human motion , 2010, I3D '10.

[32]  Taku Komura,et al.  Few‐shot Learning of Homogeneous Human Locomotion Styles , 2018, Comput. Graph. Forum.

[33]  David A. Forsyth,et al.  Generalizing motion edits with Gaussian processes , 2009, ACM Trans. Graph..

[34]  Ariel Shamir,et al.  Adult2child: Motion Style Transfer using CycleGANs , 2020, MIG.