Comp-GAN: Compositional Generative Adversarial Network in Synthesizing and Recognizing Facial Expression

Facial expression is important in understanding our social interaction. Thus the ability to recognize facial expression enables the novel multimedia applications. With the advance of recent deep architectures, research on facial expression recognition has achieved great progress. However, these models are still suffering from the problems of lacking sufficient and diverse high quality training faces, vulnerability to the facial variations, and recognizing a limited number of basic types of emotions. To tackle these problems, this paper proposes a novel end-to-end Compositional Generative Adversarial Network (Comp-GAN) that is able to synthesize new face images with specified poses and desired facial expressions; and such synthesized images can be further utilized to help train a robust and generalized expression recognition model. Essentially, Comp-GAN can dynamically change the expression and pose of faces according to the input images while keeping the identity information. Specifically, the generator has two major components: one for generating images with desired expression and the other for changing the pose of faces. Furthermore, a face reconstruction learning process is applied to re-generate the input image and constrains the generator for preserving the key information such as facial identity. For the first time, various one/zero-shot facial expression recognition tasks have been created. We conduct extensive experiments to show that the images generated by Comp-GAN are helpful to improve the performance of one/zero-shot facial expression recognition.

[1]  Min-Ling Zhang,et al.  A Review on Multi-Label Learning Algorithms , 2014, IEEE Transactions on Knowledge and Data Engineering.

[2]  Tao Xiang,et al.  Pose-Normalized Image Generation for Person Re-identification , 2017, ECCV.

[3]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[4]  Xiangyang Xue,et al.  Multi-task Deep Neural Network for Joint Face Recognition and Facial Attribute Prediction , 2017, ICMR.

[5]  Oriol Vinyals,et al.  Matching Networks for One Shot Learning , 2016, NIPS.

[6]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[7]  Simon Osindero,et al.  Conditional Generative Adversarial Nets , 2014, ArXiv.

[8]  Janusz Konrad,et al.  VGAN-Based Image Representation Learning for Privacy-Preserving Facial Expression Recognition , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[9]  Shang-Hong Lai,et al.  Emotion-Preserving Representation Learning via Generative Adversarial Network for Multi-View Facial Expression Recognition , 2018, 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018).

[10]  拓海 杉山,et al.  “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告 , 2017 .

[11]  Lijun Yin,et al.  Identity-Adaptive Facial Expression Recognition through Expression Regeneration Using Conditional Generative Adversarial Networks , 2018, 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018).

[12]  Matti Pietikäinen,et al.  Gray Scale and Rotation Invariant Texture Classification with Local Binary Patterns , 2000, ECCV.

[13]  Kaiqi Huang,et al.  A Richly Annotated Dataset for Pedestrian Attribute Recognition , 2016, ArXiv.

[14]  Shengcai Liao,et al.  Learning Face Representation from Scratch , 2014, ArXiv.

[15]  Wei Liu,et al.  DeepProduct: Mobile Product Search With Portable Deep Features , 2018, ACM Trans. Multim. Comput. Commun. Appl..

[16]  Lijun Yin,et al.  Facial Expression Recognition by De-expression Residue Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[17]  Tao Xiang,et al.  Multi-scale Deep Learning Architectures for Person Re-identification , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[18]  Mohammad H. Mahoor,et al.  Going deeper in facial expression recognition using deep neural networks , 2015, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).

[19]  Shih-Fu Chang,et al.  Modeling Multimodal Clues in a Hybrid Deep Learning Framework for Video Classification , 2017, IEEE Transactions on Multimedia.

[20]  Christoph H. Lampert,et al.  Attribute-Based Classification for Zero-Shot Visual Object Categorization , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Tieniu Tan,et al.  A Light CNN for Deep Face Representation With Noisy Labels , 2015, IEEE Transactions on Information Forensics and Security.

[22]  Shiguang Shan,et al.  AttGAN: Facial Attribute Editing by Only Changing What You Want , 2017, IEEE Transactions on Image Processing.

[23]  Shan Li,et al.  Deep Facial Expression Recognition: A Survey , 2018, IEEE Transactions on Affective Computing.

[24]  Hugo Larochelle,et al.  Optimization as a Model for Few-Shot Learning , 2016, ICLR.

[25]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[26]  Alexei A. Efros,et al.  Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Shree K. Nayar,et al.  Attribute and simile classifiers for face verification , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[28]  Thomas S. Huang,et al.  Do Deep Neural Networks Learn Facial Action Units When Doing Expression Recognition? , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[29]  Kunio Kashino,et al.  Adaptive Visual Feedback Generation for Facial Expression Improvement with Multi-task Deep Neural Networks , 2016, ACM Multimedia.

[30]  P. Ekman,et al.  What the face reveals : basic and applied studies of spontaneous expression using the facial action coding system (FACS) , 2005 .

[31]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[32]  Yu Qiao,et al.  Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks , 2016, IEEE Signal Processing Letters.

[33]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[35]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[36]  Changsheng Xu,et al.  Joint Pose and Expression Modeling for Facial Expression Recognition , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[37]  Xiaogang Wang,et al.  Deep Learning Face Attributes in the Wild , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[38]  Qiang Sun,et al.  A Fine-Grained Facial Expression Database for End-to-End Multi-Pose Facial Expression Recognition , 2019, ArXiv.

[39]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[40]  Liang Zheng,et al.  Improving Person Re-identification by Attribute and Identity Learning , 2017, Pattern Recognit..

[41]  Ole Winther,et al.  Autoencoding beyond pixels using a learned similarity metric , 2015, ICML.

[42]  Shervin Minaee,et al.  Deep-Emotion: Facial Expression Recognition Using Attentional Convolutional Network , 2019, Sensors.

[43]  Martial Hebert,et al.  Low-Shot Learning from Imaginary Data , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[44]  Tao Xiang,et al.  Leader-Based Multi-Scale Attention Deep Architecture for Person Re-Identification , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.