论文信息 - Everybody Dance Now

Everybody Dance Now

This paper presents a simple method for “do as I do” motion transfer: given a source video of a person dancing, we can transfer that performance to a novel (amateur) target after only a few minutes of the target subject performing standard moves. We approach this problem as video-to-video translation using pose as an intermediate representation. To transfer the motion, we extract poses from the source subject and apply the learned pose-to-appearance mapping to generate the target subject. We predict two consecutive frames for temporally coherent video results and introduce a separate pipeline for realistic face synthesis. Although our method is quite simple, it produces surprisingly compelling results (see video). This motivates us to also provide a forensics tool for reliable synthetic content detection, which is able to distinguish videos synthesized by our system from real data. In addition, we release a first-of-its-kind open-source dataset of videos that can be legally used for training and motion transfer.

[1] Frédo Durand,et al. Synthesizing Images of Humans in Unseen Poses , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[2] Ruben Villegas,et al. Neural Kinematic Networks for Unsupervised Motion Retargetting , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[3] Yaser Sheikh,et al. OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4] Peter V. Gehler,et al. A Generative Model of People in Clothing , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[5] Simon Osindero,et al. Conditional Generative Adversarial Nets , 2014, ArXiv.

[6] Yaser Sheikh,et al. Recycle-GAN: Unsupervised Video Retargeting , 2018, ECCV.

[7] Jan Kautz,et al. Video-to-Video Synthesis , 2018, NeurIPS.

[8] Rob Fergus,et al. Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks , 2015, NIPS.

[9] Dani Lischinski,et al. Deep Video‐Based Performance Cloning , 2018, Comput. Graph. Forum.

[10] Yong Man Ro,et al. Dynamics Transfer GAN: Generating Video by Transferring Arbitrary Temporal Dynamics from a Source Video to a Single Target Image , 2017, ArXiv.

[11] Vladlen Koltun,et al. Photographic Image Synthesis with Cascaded Refinement Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[12] Li Fei-Fei,et al. Perceptual Losses for Real-Time Style Transfer and Super-Resolution , 2016, ECCV.

[13] Martial Hebert,et al. The Pose Knows: Video Forecasting by Generating Pose Futures , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[14] Jan Kautz,et al. MoCoGAN: Decomposing Motion and Content for Video Generation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[15] Björn Ommer,et al. A Variational U-Net for Conditional Appearance and Shape Generation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[16] Cristian Sminchisescu,et al. Human Appearance Transfer , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[17] Takeo Kanade,et al. Markerless human motion transfer , 2004, Proceedings. 2nd International Symposium on 3D Data Processing, Visualization and Transmission, 2004. 3DPVT 2004..

[18] Alexei A. Efros,et al. The Unreasonable Effectiveness of Deep Features as a Perceptual Metric , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[19] Philip H. S. Torr,et al. A Semi-supervised Deep Generative Model for Human Body Analysis , 2018, ECCV Workshops.

[20] Varun Ramakrishna,et al. Convolutional Pose Machines , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21] Raymond Y. K. Lau,et al. Least Squares Generative Adversarial Networks , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[22] Chris Hecker,et al. Real-time motion retargeting to highly varied user-created morphologies , 2008, ACM Trans. Graph..

[23] Jitendra Malik,et al. Video Based Motion Synthesis by Splicing and Morphing , 2004 .

[24] Alexei A. Efros,et al. Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[25] Percy Liang,et al. Understanding Black-box Predictions via Influence Functions , 2017, ICML.

[26] Yaser Sheikh,et al. Hand Keypoint Detection in Single Images Using Multiview Bootstrapping , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27] Steven M. Seitz,et al. LookinGood , 2018, ACM Trans. Graph..

[28] Nicu Sebe,et al. Deformable GANs for Pose-Based Human Image Generation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[29] Hans-Peter Seidel,et al. Video-based characters: creating new human performances from a multi-view video database , 2011, ACM Trans. Graph..

[30] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.

[31] Sung Yong Shin,et al. A hierarchical approach to interactive motion editing for human-like figures , 1999, SIGGRAPH.

[32] JeheeLee SungYongShin. A Hierarchical Approach to Interactive Motion Editing for Human-like Figures , 1999 .

[33] Eero P. Simoncelli,et al. Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[34] Iasonas Kokkinos,et al. DensePose: Dense Human Pose Estimation in the Wild , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[35] Hyunsoo Kim,et al. Learning to Discover Cross-Domain Relations with Generative Adversarial Networks , 2017, ICML.

[36] Christoph Bregler,et al. Video Rewrite: Driving Visual Speech with Audio , 1997, SIGGRAPH.

[37] Alexei A. Efros,et al. Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38] Jan Kautz,et al. Unsupervised Image-to-Image Translation Networks , 2017, NIPS.

[39] Christian Ledig,et al. Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40] Philip H. S. Torr,et al. A Conditional Deep Generative Model of People in Natural Images , 2019, 2019 IEEE Winter Conference on Applications of Computer Vision (WACV).

[41] Luc Van Gool,et al. Disentangled Person Image Generation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[42] Jan Kautz,et al. High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[43] Christian Theobalt,et al. Neural Rendering and Reenactment of Human Actor Videos , 2018, ACM Trans. Graph..

[44] Ruben Villegas,et al. Learning to Generate Long-term Future via Hierarchical Prediction , 2017, ICML.

[45] Luc Van Gool,et al. Pose Guided Person Image Generation , 2017, NIPS.

[46] Junmo Kim,et al. Generating a Fusion Image: One's Identity and Another's Shape , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[47] Michael Gleicher,et al. Retargetting motion to new characters , 1998, SIGGRAPH.

[48] 拓海杉山,et al. “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告 , 2017 .

[49] Jitendra Malik,et al. Recognizing action at a distance , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[50] Patrick Pérez,et al. Deep video portraits , 2018, ACM Trans. Graph..

[51] Ming-Yu Liu,et al. Coupled Generative Adversarial Networks , 2016, NIPS.

[52] Ju Shen,et al. Automatic Pose Tracking and Motion Transfer to Arbitrary 3D Characters , 2015, ICIG.

[53] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[54] Adrian Hilton,et al. 4D video textures for interactive character appearance , 2014, Comput. Graph. Forum.