High Fidelity Face Manipulation with Extreme Pose and Expression

Face manipulation has shown remarkable advances with the flourish of Generative Adversarial Networks. However, due to the difficulties of controlling the structure and texture in high-resolution, it is challenging to simultaneously model pose and expression during manipulation. In this paper, we propose a novel framework that simplifies face manipulation with extreme pose and expression into two correlated stages: a boundary prediction stage and a disentangled face synthesis stage. In the first stage, we propose to use a boundary image for joint pose and expression modeling. An encoder-decoder network is employed to predict the boundary image of the target face in a semi-supervised way. Pose and expression estimators are used to improve the prediction accuracy. In the second stage, the predicted boundary image and the original face are encoded into the structure and texture latent space by two encoder networks respectively. A proxy network and a feature threshold loss are further imposed as constraints to disentangle the latent space. In addition, we build up a new high quality Multi-View Face (MVF-HQ) database that contains 120K high-resolution face images of 479 identities with pose and expression variations, which will be released soon. Qualitative and quantitative experiments on four databases show that our method pushes forward the advance of extreme face manipulation from 128 $\times$ 128 resolution to 1024 $\times$ 1024 resolution, and significantly improves the face recognition performance under large poses.

[1]  Yuichi Yoshida,et al.  Spectral Normalization for Generative Adversarial Networks , 2018, ICLR.

[2]  Georgios Tzimiropoulos,et al.  How Far are We from Solving the 2D & 3D Face Alignment Problem? (and a Dataset of 230,000 3D Facial Landmarks) , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[3]  Fang Zhao,et al.  Towards Pose Invariant Face Recognition in the Wild , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[4]  Matthew Turk,et al.  A Morphable Model For The Synthesis Of 3D Faces , 1999, SIGGRAPH.

[5]  Xiaoming Liu,et al.  Disentangled Representation Learning GAN for Pose-Invariant Face Recognition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Jaakko Lehtinen,et al.  Progressive Growing of GANs for Improved Quality, Stability, and Variation , 2017, ICLR.

[7]  Zhenan Sun,et al.  Global and Local Consistent Wavelet-Domain Age Synthesis , 2018, IEEE Transactions on Information Forensics and Security.

[8]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[9]  Fumin Shen,et al.  Make a Face: Towards Arbitrary High Fidelity Face Manipulation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[10]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[11]  Jung-Woo Ha,et al.  StarGAN: Unified Generative Adversarial Networks for Multi-domain Image-to-Image Translation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[12]  Fei Yang,et al.  Expression flow for 3D-aware face component transfer , 2011, ACM Trans. Graph..

[13]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[14]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Yuxiao Hu,et al.  MS-Celeb-1M: A Dataset and Benchmark for Large-Scale Face Recognition , 2016, ECCV.

[16]  Timo Aila,et al.  A Style-Based Generator Architecture for Generative Adversarial Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Louis-Philippe Morency,et al.  OpenFace 2.0: Facial Behavior Analysis Toolkit , 2018, 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018).

[18]  Qirong Mao,et al.  A Unified Deep Model for Joint Facial Expression Recognition, Face Synthesis, and Face Alignment , 2020, IEEE Transactions on Image Processing.

[19]  拓海 杉山,et al.  “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告 , 2017 .

[20]  Francesc Moreno-Noguer,et al.  GANimation: Anatomically-aware Facial Animation from a Single Image , 2018, ECCV.

[21]  Youngjoo Jo,et al.  SC-FEGAN: Face Editing Generative Adversarial Network With User’s Sketch and Color , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[22]  Sepp Hochreiter,et al.  GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.

[23]  Shuicheng Yan,et al.  Look Across Elapse: Disentangled Representation Learning and Photorealistic Cross-Age Face Synthesis for Age-Invariant Face Recognition , 2018, AAAI.

[24]  Jeff Donahue,et al.  Large Scale GAN Training for High Fidelity Natural Image Synthesis , 2018, ICLR.

[25]  Hans-Peter Seidel,et al.  Computer‐Suggested Facial Makeup , 2011, Comput. Graph. Forum.

[26]  Richard S. Zemel,et al.  Generative Moment Matching Networks , 2015, ICML.

[27]  Tieniu Tan,et al.  IntroVAE: Introspective Variational Autoencoders for Photographic Image Synthesis , 2018, NeurIPS.

[28]  Yi Li,et al.  Pose Agnostic Cross-spectral Hallucination via Disentangling Independent Factors , 2019, ArXiv.

[29]  Xiaogang Wang,et al.  FaceID-GAN: Learning a Symmetry Three-Player GAN for Identity-Preserving Face Synthesis , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[30]  Iasonas Kokkinos,et al.  Deforming Autoencoders: Unsupervised Disentangling of Shape and Appearance , 2018, ECCV.

[31]  Oriol Vinyals,et al.  Neural Discrete Representation Learning , 2017, NIPS.

[32]  Xiangyu Zhu,et al.  Face Alignment in Full Pose Range: A 3D Total Solution , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  Ran He,et al.  Beyond Face Rotation: Global and Local Perception GAN for Photorealistic and Identity Preserving Frontal View Synthesis , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[34]  Zhenan Sun,et al.  Pose-Guided Photorealistic Face Rotation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[35]  Takeo Kanade,et al.  Multi-PIE , 2008, 2008 8th IEEE International Conference on Automatic Face & Gesture Recognition.

[36]  Taesung Park,et al.  Semantic Image Synthesis With Spatially-Adaptive Normalization , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Victor Lempitsky,et al.  Few-Shot Adversarial Learning of Realistic Neural Talking Head Models , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[38]  Ira Kemelmacher-Shlizerman,et al.  Illumination-Aware Age Progression , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[39]  Terence Sim,et al.  The CMU Pose, Illumination, and Expression (PIE) database , 2002, Proceedings of Fifth IEEE International Conference on Automatic Face Gesture Recognition.

[40]  Thomas S. Huang,et al.  Generative Image Inpainting with Contextual Attention , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[41]  Alexei A. Efros,et al.  Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Jan Kautz,et al.  High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[43]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[44]  Jan Kautz,et al.  Video-to-Video Synthesis , 2018, NeurIPS.

[45]  Léon Bottou,et al.  Wasserstein Generative Adversarial Networks , 2017, ICML.

[46]  Gang Hua,et al.  Towards Open-Set Identity Preserving Face Synthesis , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[47]  Changsheng Xu,et al.  Joint Pose and Expression Modeling for Facial Expression Recognition , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[48]  Zhenan Sun,et al.  Disentangled Variational Representation for Heterogeneous Face Recognition , 2018, AAAI.

[49]  Kaiming He,et al.  Data Distillation: Towards Omni-Supervised Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[50]  Zhe L. Lin,et al.  Semantic Component Decomposition for Face Attribute Manipulation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[51]  Tieniu Tan,et al.  Geometry Guided Adversarial Facial Expression Synthesis , 2017, ACM Multimedia.

[52]  Wen Gao,et al.  The CAS-PEAL Large-Scale Chinese Face Database and Baseline Evaluations , 2008, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[53]  Songyao Jiang,et al.  Dual-Attention GAN for Large-Pose Face Frontalization , 2020, 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020).

[54]  Jung-Woo Ha,et al.  StarGAN v2: Diverse Image Synthesis for Multiple Domains , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[55]  Stefanos Zafeiriou,et al.  Face Video Generation from a Single Image and Landmarks , 2019, 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020).

[56]  Guillermo Sapiro,et al.  Not Afraid of the Dark: NIR-VIS Face Recognition via Cross-Spectral Hallucination and Low-Rank Embedding , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[57]  Samy Bengio,et al.  Density estimation using Real NVP , 2016, ICLR.

[58]  Junjie Jia,et al.  Unsupervised Representation Learning of Image-Based Plant Disease with Deep Convolutional Generative Adversarial Networks , 2018, 2018 37th Chinese Control Conference (CCC).

[59]  Luc Van Gool,et al.  Pose Guided Person Image Generation , 2017, NIPS.

[60]  Hugo Proença,et al.  BioHDD: a dataset for studying biometric identification on heavily degraded data , 2015, IET Biom..

[61]  Jan Kautz,et al.  MoCoGAN: Decomposing Motion and Content for Video Generation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[62]  Xiaogang Wang,et al.  Deep Learning Face Attributes in the Wild , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[63]  Tieniu Tan,et al.  Wasserstein CNN: Learning Invariant Features for NIR-VIS Face Recognition , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[64]  Tieniu Tan,et al.  A Light CNN for Deep Face Representation With Noisy Labels , 2015, IEEE Transactions on Information Forensics and Security.

[65]  Skyler T. Hawk,et al.  Presentation and validation of the Radboud Faces Database , 2010 .

[66]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[67]  Alex Graves,et al.  Conditional Image Generation with PixelCNN Decoders , 2016, NIPS.

[68]  Ali Razavi,et al.  Generating Diverse High-Fidelity Images with VQ-VAE-2 , 2019, NeurIPS.

[69]  Stefanos Zafeiriou,et al.  UV-GAN: Adversarial Facial UV Map Completion for Pose-Invariant Face Recognition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[70]  Ran He,et al.  Dual Variational Generation for Low-Shot Heterogeneous Face Recognition , 2019, NeurIPS.

[71]  Yu Cheng,et al.  3D-Aided Deep Pose-Invariant Face Recognition , 2018, IJCAI.

[72]  Du-Sik Park,et al.  Rotating your face using multi-task deep neural network , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[73]  Mehran Ebrahimi,et al.  EdgeConnect: Generative Image Inpainting with Adversarial Edge Learning , 2019, ArXiv.

[74]  Gang Hua,et al.  Face Relighting from a Single Image under Arbitrary Unknown Lighting Conditions , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[75]  Justus Thies,et al.  Real-time expression transfer for facial reenactment , 2015, ACM Trans. Graph..

[76]  Adam Finkelstein,et al.  PairedCycleGAN: Asymmetric Style Transfer for Applying and Removing Makeup , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[77]  Jaakko Lehtinen,et al.  Analyzing and Improving the Image Quality of StyleGAN , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[78]  Sepp Hochreiter,et al.  GANs Trained by a Two Time-Scale Update Rule Converge to a Nash Equilibrium , 2017, ArXiv.

[79]  Tomaso A. Poggio,et al.  Reanimating Faces in Images and Video , 2003, Comput. Graph. Forum.

[80]  P. Ekman,et al.  Facial action coding system: a technique for the measurement of facial movement , 1978 .

[81]  Yiying Tong,et al.  FaceWarehouse: A 3D Facial Expression Database for Visual Computing , 2014, IEEE Transactions on Visualization and Computer Graphics.

[82]  Bolei Zhou,et al.  Interpreting the Latent Space of GANs for Semantic Face Editing , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[83]  Xiaoming Liu,et al.  Coefficients Pose-Variant Input Recogni 8 on Engine Frontalized Output Generator FF-GAN D Discriminator Extreme Pose Input Frontalized Output , 2017 .