CPTNet: Cascade Pose Transform Network for Single Image Talking Head Animation

We study the problem of talking head animation from a single image. Most of the existing methods focus on generating talking heads for human. However, little attention has been paid to the creation of talking head anime. In this paper, our goal is to synthesize vivid talking heads from a single anime image. To this end, we propose cascade pose transform network, termed CPTNet, that consists of a face pose transform network and a head pose transform network. Specifically, we introduce a mask generator to animate facial expression (e.g., close eyes and open mouth) and a grid generator for head movement animation, followed by a fusion module to generate talking heads. In order to handle large motion and obtain more accurate results, we design a pose vector decomposition and cascaded refinement strategy. In addition, we create an anime talking head dataset, that includes various anime characters and poses, to train our model. Extensive experiments on our dataset demonstrate that our model outperforms other methods, generating more accurate and vivid talking heads from a single anime image.

[1]  Hao Dong,et al.  Unpaired Image-to-Image Translation using Adversarial Consistency Loss , 2020, ECCV.

[2]  Christian Ledig,et al.  Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Eero P. Simoncelli,et al.  Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[4]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[5]  Victor Lempitsky,et al.  Few-Shot Adversarial Learning of Realistic Neural Talking Head Models , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[6]  Yu-Wing Tai,et al.  Landmark Assisted CycleGAN for Cartoon Face Generation , 2019, ArXiv.

[7]  Xiao Liu,et al.  STGAN: A Unified Selective Transfer Network for Arbitrary Image Attribute Editing , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[9]  Yong-Jin Liu,et al.  CartoonGAN: Generative Adversarial Networks for Photo Cartoonization , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[10]  Shuicheng Yan,et al.  Neural Style Transfer via Meta Networks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[11]  Wenhan Luo,et al.  Liquid Warping GAN: A Unified Framework for Human Motion Imitation, Appearance Transfer and Novel View Synthesis , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[12]  Skyler T. Hawk,et al.  Presentation and validation of the Radboud Faces Database , 2010 .

[13]  Jung-Woo Ha,et al.  StarGAN: Unified Generative Adversarial Networks for Multi-domain Image-to-Image Translation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[14]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Francesc Moreno-Noguer,et al.  GANimation: Anatomically-aware Facial Animation from a Single Image , 2018, ECCV.

[16]  Rita Cucchiara,et al.  Art2Real: Unfolding the Reality of Artworks via Semantically-Aware Image-To-Image Translation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Shiguang Shan,et al.  AttGAN: Facial Attribute Editing by Only Changing What You Want , 2017, IEEE Transactions on Image Processing.

[18]  Jitendra Malik,et al.  View Synthesis by Appearance Flow , 2016, ECCV.

[19]  拓海 杉山,et al.  “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告 , 2017 .

[20]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[21]  Bernt Schiele,et al.  Generative Adversarial Text to Image Synthesis , 2016, ICML.

[22]  Shijian Lu,et al.  Cascade EF-GAN: Progressive Facial Expression Editing With Local Focuses , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Dimitris Samaras,et al.  Self-supervised Deformation Modeling for Facial Expression Editing , 2019, 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020).

[24]  Xiaogang Wang,et al.  Deep Learning Face Attributes in the Wild , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[25]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[26]  Jonathon Shlens,et al.  Conditional Image Synthesis with Auxiliary Classifier GANs , 2016, ICML.

[27]  Li Fei-Fei,et al.  Perceptual Losses for Real-Time Style Transfer and Super-Resolution , 2016, ECCV.

[28]  Alexei A. Efros,et al.  Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Timo Aila,et al.  A Style-Based Generator Architecture for Generative Adversarial Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).