One-stage Context and Identity Hallucination Network

Face swapping aims to synthesize a face image, in which the facial identity is well transplanted from the source image and the context (e.g., hairstyle, head posture, facial expression, lighting, and background) keeps consistent with the reference image. The prior work mainly accomplishes the task in two stages, i.e., generating the inner face with the source identity, and then stitching the generation with the complementary part of the reference image by image blending techniques. The blending mask, which is usually obtained by the additional face segmentation model, is a common practice towards photo-realistic face swapping. However, artifacts usually appear at the blending boundary, especially in areas occluded by the hair, eyeglasses, accessories, etc. To address this problem, rather than struggling with the blending mask in the two-stage routine, we develop a novel one-stage context and identity hallucination network, which learns a series of hallucination maps to softly divide the context areas and identity areas. For context areas, the features are fully utilized by a multi-level context encoder. For identity areas, we design a novel two-cascading AdaIN to transfer the identity while retaining the context. Besides, with the help of hallucination maps, we introduce an effectively improved reconstruction loss to utilize unlimited unpaired face images for training. Our network performs well on both context areas and identity areas without any dependency on post-processing. Extensive qualitative and quantitative experiments demonstrate the superiority of our network.

[1]  James M. Rehg,et al.  Fine-Grained Head Pose Estimation Without Keypoints , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[2]  Andrew Zisserman,et al.  Deep Face Recognition , 2015, BMVC.

[3]  Joachim Giesen,et al.  Delaunay Triangulation Based Surface Reconstruction , 2006 .

[4]  Tal Hassner,et al.  On Face Segmentation, Face Swapping, and Face Perception , 2017, 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018).

[5]  Fang Wen,et al.  FaceShifter: Towards High Fidelity And Occlusion Aware Face Swapping , 2019, ArXiv.

[6]  Jaakko Lehtinen,et al.  Progressive Growing of GANs for Improved Quality, Stability, and Variation , 2017, ICLR.

[7]  Ran He,et al.  Beyond Face Rotation: Global and Local Perception GAN for Photorealistic and Identity Preserving Frontal View Synthesis , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[8]  Stefanos Zafeiriou,et al.  ArcFace: Additive Angular Margin Loss for Deep Face Recognition , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Kai Hormann,et al.  A general construction of barycentric coordinates over convex polygons , 2006, Adv. Comput. Math..

[10]  Hailin Shi,et al.  The 3rd Grand Challenge of Lightweight 106-Point Facial Landmark Localization on Masked Faces , 2021, 2021 IEEE International Conference on Multimedia & Expo Workshops (ICMEW).

[11]  Bingbing Ni,et al.  SimSwap: An Efficient Framework For High Fidelity Face Swapping , 2020, ACM Multimedia.

[12]  Stefanos Zafeiriou,et al.  RetinaFace: Single-stage Dense Face Localisation in the Wild , 2019, ArXiv.

[13]  Patrick Pérez,et al.  VDub: Modifying Face Video of Actors for Plausible Visual Alignment to a Dubbed Audio Track , 2015, Comput. Graph. Forum.

[14]  Sami Romdhani,et al.  Face identification across different poses and illuminations with a 3D morphable model , 2002, Proceedings of Fifth IEEE International Conference on Automatic Face Gesture Recognition.

[15]  Tal Hassner,et al.  FSGAN: Subject Agnostic Face Swapping and Reenactment , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[16]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[17]  Ira Kemelmacher-Shlizerman,et al.  Synthesizing Obama , 2017, ACM Trans. Graph..

[18]  Shigeo Morishima,et al.  RSGAN: face swapping and editing using face and hair representation in latent spaces , 2018, SIGGRAPH Posters.

[19]  Patrick Pérez,et al.  Deep video portraits , 2018, ACM Trans. Graph..

[20]  Andreas Rössler,et al.  FaceForensics++: Learning to Detect Manipulated Facial Images , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[21]  Thomas Vetter,et al.  Face Recognition Based on Fitting a 3D Morphable Model , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[22]  Shree K. Nayar,et al.  Face swapping: automatically replacing faces in photographs , 2008, SIGGRAPH 2008.

[23]  Xiaoming Liu,et al.  Disentangled Representation Learning GAN for Pose-Invariant Face Recognition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  The $2^\mathrm{nd}$ 106-Point Lightweight Facial Landmark Localization Grand Challenge , 2020, ICPR Workshops.

[25]  Arun Ross,et al.  Visual Cryptography for Biometric Privacy , 2011, IEEE Transactions on Information Forensics and Security.

[26]  Jeff Donahue,et al.  Large Scale GAN Training for High Fidelity Natural Image Synthesis , 2018, ICLR.

[27]  Patrick Pérez,et al.  Automatic Face Reenactment , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Edward J. Delp,et al.  Deepfake Video Detection Using Recurrent Neural Networks , 2018, 2018 15th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS).

[29]  Proceedings of the 28th ACM International Conference on Multimedia , 2020, ACM Multimedia.

[30]  Xiaoming Liu,et al.  Coefficients Pose-Variant Input Recogni 8 on Engine Frontalized Output Generator FF-GAN D Discriminator Extreme Pose Input Frontalized Output , 2017 .

[31]  Chunhong Pan,et al.  Facial image composition based on active appearance model , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[32]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[33]  Yu Qiao,et al.  Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks , 2016, IEEE Signal Processing Letters.

[34]  Justus Thies,et al.  Real-time expression transfer for facial reenactment , 2015, ACM Trans. Graph..

[35]  Siwei Lyu,et al.  Exposing DeepFake Videos By Detecting Face Warping Artifacts , 2018, CVPR Workshops.

[36]  Hans-Peter Seidel,et al.  Exchanging Faces in Images , 2004, Comput. Graph. Forum.

[37]  Tao Mei,et al.  FaceX-Zoo: A PyTorch Toolbox for Face Recognition , 2021, ACM Multimedia.

[38]  Hailin Shi,et al.  A New Dataset and Boundary-Attention Semantic Segmentation for Face Parsing , 2020, AAAI.

[39]  Yuan Lin,et al.  Face Swapping under Large Pose Variations: A 3D Model Based Approach , 2012, 2012 IEEE International Conference on Multimedia and Expo.

[40]  Timo Aila,et al.  A Style-Based Generator Architecture for Generative Adversarial Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Gang Hua,et al.  Towards Open-Set Identity Preserving Face Synthesis , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[42]  Justus Thies,et al.  Deferred Neural Rendering: Image Synthesis using Neural Textures , 2019 .

[43]  Justus Thies,et al.  Deferred neural rendering , 2019, ACM Trans. Graph..

[44]  Hao Shen,et al.  Grand Challenge of 106-Point Facial Landmark Localization , 2019, 2019 IEEE International Conference on Multimedia & Expo Workshops (ICMEW).