U-Net Conditional GANs for Photo-Realistic and Identity-Preserving Facial Expression Synthesis

Facial expression synthesis (FES) is a challenging task since the expression changes are highly non-linear and depend on the facial appearance. Person identity should also be well preserved in the synthesized face. In this article, we present a novel U-Net Conditional Generative Adversarial Network for FES. U-Net helps retain the property of the input face, including the identity information and facial details. Category condition is added to the U-Net model so that one-to-many expression synthesis can be achieved simultaneously. We also design constraints for identity preservation during FES to further guarantee that the identity of the input face can be well preserved in the generated face image. Specifically, we pair the generated output with condition image of other identities for the discriminator, so as to encourage it to learn the distinctions between the synthesized and natural images, as well as between input and other identities, which can help improve its discriminating ability. Additionally, we utilize the triplet loss to maintain the generated face images closer to the same identity person by imposing a margin between the positive pairs and negative pairs in feature space. Both qualitative and quantitative evaluations are conducted on the Oulu-CASIA NIR8VIS facial expression database, the Radboud Faces Database, and the Karolinska Directed Emotional Faces database, and the experimental results show that our method can generate faces with natural and realistic expressions while preserving identity information.

[1]  Matti Pietikäinen,et al.  Facial expression recognition from near-infrared videos , 2011, Image Vis. Comput..

[2]  Alexei A. Efros,et al.  Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[3]  Justus Thies,et al.  Face2Face: real-time face capture and reenactment of RGB videos , 2019, Commun. ACM.

[4]  Thaddeus Beier,et al.  Feature-based image metamorphosis , 1998 .

[5]  Jinhui Tang,et al.  Single Image Dehazing via Conditional Generative Adversarial Network , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[6]  Zicheng Liu,et al.  Expressive expression mapping with ratio images , 2001, SIGGRAPH.

[7]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[8]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Jon Gauthier Conditional generative adversarial nets for convolutional face generation , 2015 .

[10]  Li Fei-Fei,et al.  Perceptual Losses for Real-Time Style Transfer and Super-Resolution , 2016, ECCV.

[11]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  D. Lundqvist,et al.  Karolinska Directed Emotional Faces , 2015 .

[13]  Francesc Moreno-Noguer,et al.  GANimation: Anatomically-aware Facial Animation from a Single Image , 2018, ECCV.

[14]  Alexander M. Bronstein,et al.  Expression-Invariant 3D Face Recognition , 2003, AVBPA.

[15]  Jung-Woo Ha,et al.  StarGAN: Unified Generative Adversarial Networks for Multi-domain Image-to-Image Translation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[16]  Lucas Beyer,et al.  In Defense of the Triplet Loss for Person Re-Identification , 2017, ArXiv.

[17]  David Zhang,et al.  Deep Identity-aware Transfer of Facial Attributes , 2016, ArXiv.

[18]  Steven M. Seitz,et al.  View morphing , 1996, SIGGRAPH.

[19]  Fisher Yu,et al.  TextureGAN: Controlling Deep Image Synthesis with Texture Patches , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[20]  Jane You,et al.  Hard negative generation for identity-disentangled facial expression recognition , 2019, Pattern Recognit..

[21]  Alexei A. Efros,et al.  Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Yu Qiao,et al.  Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks , 2016, IEEE Signal Processing Letters.

[23]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[24]  Ziwei Liu,et al.  Semantic Facial Expression Editing using Autoencoded Flow , 2016, ArXiv.

[25]  Anil K. Jain,et al.  Learning Face Age Progression: A Pyramid Architecture of GANs , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[26]  Jonathon Shlens,et al.  Conditional Image Synthesis with Auxiliary Classifier GANs , 2016, ICML.

[27]  Chao Yang,et al.  Normalized face image generation with perceptron generative adversarial networks , 2018, 2018 IEEE 4th International Conference on Identity, Security, and Behavior Analysis (ISBA).

[28]  Skyler T. Hawk,et al.  Presentation and validation of the Radboud Faces Database , 2010 .

[29]  Christian Ledig,et al.  Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[31]  Bruno A. Olshausen,et al.  Discovering Hidden Factors of Variation in Deep Networks , 2014, ICLR.

[32]  Hao Zhang,et al.  Expression-Invariant Face Recognition with Expression Classification , 2006, The 3rd Canadian Conference on Computer and Robot Vision (CRV'06).

[33]  Yunhong Wang,et al.  Facial Expression Synthesis by U-Net Conditional Generative Adversarial Networks , 2018, ICMR.

[34]  Rama Chellappa,et al.  ExprGAN: Facial Expression Editing with Controllable Expression Intensity , 2017, AAAI.

[35]  Chao Yang,et al.  Realistic Dynamic Facial Textures from a Single Image Using GANs , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[36]  Christine L. Lisetti,et al.  Automatic facial expression interpretation: Where human-computer interaction, artificial intelligence and cognitive science intersect , 2000 .

[37]  Ran He,et al.  Beyond Face Rotation: Global and Local Perception GAN for Photorealistic and Identity Preserving Frontal View Synthesis , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[38]  Justus Thies,et al.  Demo of Face2Face: real-time face capture and reenactment of RGB videos , 2016, SIGGRAPH Emerging Technologies.

[39]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[40]  Shree K. Nayar,et al.  FaceTracer: A Search Engine for Large Collections of Images with Faces , 2008, ECCV.

[41]  Andrew L. Maas Rectifier Nonlinearities Improve Neural Network Acoustic Models , 2013 .

[42]  Geoffrey E. Hinton,et al.  Generating Facial Expressions with Deep Belief Nets , 2008 .

[43]  Jan Kautz,et al.  High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[44]  Takeo Kanade,et al.  Detection, tracking, and classification of action units in facial expression , 2000, Robotics Auton. Syst..

[45]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[46]  Yuting Zhang,et al.  Learning to Disentangle Factors of Variation with Manifold Interaction , 2014, ICML.

[47]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[48]  Baining Guo,et al.  Geometry-driven photorealistic facial expression synthesis , 2003, IEEE Transactions on Visualization and Computer Graphics.

[49]  Simon Osindero,et al.  Conditional Generative Adversarial Nets , 2014, ArXiv.

[50]  Bertram E. Shi,et al.  Photorealistic facial expression synthesis by the conditional difference adversarial autoencoder , 2017, 2017 Seventh International Conference on Affective Computing and Intelligent Interaction (ACII).

[51]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[52]  Tieniu Tan,et al.  Geometry Guided Adversarial Facial Expression Synthesis , 2017, ACM Multimedia.