论文信息 - Generating Video from Single Image and Sound

Generating Video from Single Image and Sound

In this paper, we propose a method of generating a video linked to sound from a single image and a few seconds of sound while maintaining the appearance of the image. Conventional video generation methods from sound require key points extraction related to the sound in each object, such as the mouth in speech and arms in musical instrument performance. They can not be applied to objects whose shape changes significantly like fireworks. The proposed method can generate a video without extracting specific key points from images. We experimented not only the mouth shape and body pose of human treated in the conventional ways, but also fireworks and sea waves where it is difficult to design key points.

[1] Fabio Viola,et al. The Kinetics Human Action Video Dataset , 2017, ArXiv.

[2] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[3] Geoffrey E. Hinton,et al. Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[4] Ira Kemelmacher-Shlizerman,et al. Audio to Body Dynamics , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[5] Antonio Torralba,et al. SoundNet: Learning Sound Representations from Unlabeled Video , 2016, NIPS.

[6] Jan Kautz,et al. MoCoGAN: Decomposing Motion and Content for Video Generation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[7] Thomas Brox,et al. U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[8] Antonio Torralba,et al. Generating Videos with Scene Dynamics , 2016, NIPS.

[9] Shunta Saito,et al. Temporal Generative Adversarial Nets with Singular Value Clipping , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[10] Mubarak Shah,et al. UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild , 2012, ArXiv.

[11] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.

[12] Ira Kemelmacher-Shlizerman,et al. Synthesizing Obama , 2017, ACM Trans. Graph..