Explicit Facial Expression Transfer via Fine-Grained Representations

Facial expression transfer between two unpaired images is a challenging problem, as fine-grained expression is typically tangled with other facial attributes. Most existing methods treat expression transfer as an application of expression manipulation, and use predicted global expression, landmarks or action units (AUs) as a guidance. However, the prediction may be inaccurate, which limits the performance of transferring fine-grained expression. Instead of using an intermediate estimated guidance, we propose to explicitly transfer facial expression by directly mapping two unpaired input images to two synthesized images with swapped expressions. Specifically, considering AUs semantically describe fine-grained expression details, we propose a novel multi-class adversarial training method to disentangle input images into two types of fine-grained representations: AU-related feature and AU-free feature. Then, we can synthesize new images with preserved identities and swapped expressions by combining AU-free features with swapped AU-related features. Moreover, to obtain reliable expression transfer results of the unpaired input, we introduce a swap consistency loss to make the synthesized images and self-reconstructed images indistinguishable. Extensive experiments show that our approach outperforms the state-of-the-art expression manipulation methods for transferring fine-grained expressions while preserving other attributes including identity and pose.

[1]  Yuichi Yoshida,et al.  Spectral Normalization for Generative Adversarial Networks , 2018, ICLR.

[2]  Shiguang Shan,et al.  AttGAN: Facial Attribute Editing by Only Changing What You Want , 2017, IEEE Transactions on Image Processing.

[3]  Hui Chen,et al.  Emotional facial expression transfer from a single image via generative adversarial nets , 2018, Comput. Animat. Virtual Worlds.

[4]  Jung-Woo Ha,et al.  StarGAN: Unified Generative Adversarial Networks for Multi-domain Image-to-Image Translation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[5]  Xiaoming Liu,et al.  Disentangled Representation Learning GAN for Pose-Invariant Face Recognition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Katherine B. Martin,et al.  Facial Action Coding System , 2015 .

[7]  Louis-Philippe Morency,et al.  OpenFace 2.0: Facial Behavior Analysis Toolkit , 2018, 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018).

[8]  Francesc Moreno-Noguer,et al.  GANimation: One-Shot Anatomically Consistent Facial Animation , 2019, International Journal of Computer Vision.

[9]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[10]  J. Fleiss,et al.  Intraclass correlations: uses in assessing rater reliability. , 1979, Psychological bulletin.

[11]  Jianfei Cai,et al.  Facial Action Unit Detection Using Attention and Relation Learning , 2018, IEEE Transactions on Affective Computing.

[12]  Iasonas Kokkinos,et al.  Deforming Autoencoders: Unsupervised Disentangling of Shape and Appearance , 2018, ECCV.

[13]  Fei Yang,et al.  Facial expression editing in video using a temporally-smooth factorization , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Tieniu Tan,et al.  Geometry Guided Adversarial Facial Expression Synthesis , 2017, ACM Multimedia.

[15]  Tieniu Tan,et al.  A Light CNN for Deep Face Representation With Noisy Labels , 2015, IEEE Transactions on Information Forensics and Security.

[16]  Skyler T. Hawk,et al.  Presentation and validation of the Radboud Faces Database , 2010 .

[17]  Andrea Vedaldi,et al.  Instance Normalization: The Missing Ingredient for Fast Stylization , 2016, ArXiv.

[18]  Shijian Lu,et al.  Cascade EF-GAN: Progressive Facial Expression Editing With Local Focuses , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Yuting Zhang,et al.  Learning to Disentangle Factors of Variation with Manifold Interaction , 2014, ICML.

[20]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[21]  Andrew Zisserman,et al.  Deep Face Recognition , 2015, BMVC.

[22]  Maja Pantic,et al.  Web-based database for facial expression analysis , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[23]  Jianfei Cai,et al.  Deep Adaptive Attention for Joint Facial Action Unit Detection and Face Alignment , 2018, ECCV.

[24]  Jiaya Jia,et al.  Homomorphic Latent Space Interpolation for Unpaired Image-To-Image Translation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Fei Yang,et al.  Expression flow for 3D-aware face component transfer , 2011, ACM Trans. Graph..

[26]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Jianfei Cai,et al.  Pluralistic Image Completion , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[29]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[30]  Joshua Correll,et al.  The Chicago face database: A free stimulus set of faces and norming data , 2015, Behavior research methods.

[31]  拓海 杉山,et al.  “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告 , 2017 .

[32]  Mohammad H. Mahoor,et al.  DISFA: A Spontaneous Facial Action Intensity Database , 2013, IEEE Transactions on Affective Computing.

[33]  M. Pantic,et al.  Induced Disgust , Happiness and Surprise : an Addition to the MMI Facial Expression Database , 2010 .

[34]  Maneesh Kumar Singh,et al.  DRIT++: Diverse Image-to-Image Translation via Disentangled Representations , 2019, International Journal of Computer Vision.

[35]  Raymond Y. K. Lau,et al.  Least Squares Generative Adversarial Networks , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[36]  Patrick Pérez,et al.  Automatic Face Reenactment , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[37]  Qiang Chen,et al.  Network In Network , 2013, ICLR.

[38]  Rama Chellappa,et al.  ExprGAN: Facial Expression Editing with Controllable Expression Intensity , 2017, AAAI.

[39]  Vladimir Pavlovic,et al.  Deep Structured Learning for Facial Action Unit Intensity Estimation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[41]  P. Ekman,et al.  Facial action coding system: a technique for the measurement of facial movement , 1978 .