Learning to disentangle emotion factors for facial expression recognition in the wild

Facial expression recognition (FER) in the wild is a very challenging problem due to different expressions under complex scenario (e.g., large head pose, illumination variation, occlusions, etc.), leading to suboptimal FER performance. Accuracy in FER heavily relies on discovering superior discriminative, emotion‐related features. In this paper, we propose an end‐to‐end module to disentangle latent emotion discriminative factors from the complex factors variables for FER to obtain salient emotion features. The training of proposed method contains two stages. First of all, emotion samples are used to obtain the latent representation using a variational auto‐encoder with reconstruction penalization. Furthermore, the latent representation as the input is thrown into a disentangling layer to learn a set of discriminative emotion factors through the attention mechanism (e.g., a Squeeze‐and‐Excitation block) that encourages to separate emotion‐related factors and nonaffective factors. Experimental results on public benchmark databases (RAF‐DB and FER2013) show that our approach has remarkable performance in complex scenes than current state‐of‐the‐art methods.

[1]  Matias Valdenegro-Toro,et al.  Real-time Convolutional Neural Networks for emotion and gender classification , 2017, ESANN.

[2]  Tao Mei,et al.  Learning Multi-attention Convolutional Neural Network for Fine-Grained Image Recognition , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[3]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[4]  Junmo Kim,et al.  Joint Fine-Tuning in Deep Neural Networks for Facial Expression Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[5]  Shan Li,et al.  Deep Facial Expression Recognition: A Survey , 2018, IEEE Transactions on Affective Computing.

[6]  Tamás D. Gedeon,et al.  EmotiW 2018: Audio-Video, Student Engagement and Group-Level Affect Prediction , 2018, ICMI.

[7]  Mohammad H. Mahoor,et al.  Going deeper in facial expression recognition using deep neural networks , 2015, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).

[8]  Zhiyuan Li,et al.  Identity-Free Facial Expression Recognition Using Conditional Generative Adversarial Network , 2019, 2021 IEEE International Conference on Image Processing (ICIP).

[9]  Maja Pantic,et al.  Discriminative Shared Gaussian Processes for Multiview and View-Invariant Facial Expression Recognition , 2015, IEEE Transactions on Image Processing.

[10]  Pascal Vincent,et al.  Disentangling Factors of Variation for Facial Expression Recognition , 2012, ECCV.

[11]  Aurobinda Routray,et al.  Automatic facial expression recognition using features of salient facial patches , 2015, IEEE Transactions on Affective Computing.

[12]  Shan Li,et al.  Reliable Crowdsourcing and Deep Locality-Preserving Learning for Unconstrained Facial Expression Recognition , 2019, IEEE Transactions on Image Processing.

[13]  C. Koch,et al.  Computational modelling of visual attention , 2001, Nature Reviews Neuroscience.

[14]  Yu Qiao,et al.  Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks , 2016, IEEE Signal Processing Letters.

[15]  Junping Du,et al.  Reliable Crowdsourcing and Deep Locality-Preserving Learning for Expression Recognition in the Wild , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Shaogang Gong,et al.  Facial expression recognition based on Local Binary Patterns: A comprehensive study , 2009, Image Vis. Comput..

[17]  Jianfei Yang,et al.  Region Attention Networks for Pose and Occlusion Robust Facial Expression Recognition , 2019, IEEE Transactions on Image Processing.

[18]  LinLin Shen,et al.  Hand-Crafted Feature Guided Deep Learning for Facial Expression Recognition , 2018, 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018).

[19]  Frank D. Wood,et al.  Learning Disentangled Representations with Semi-Supervised Deep Generative Models , 2017, NIPS.

[20]  Yongxin Zhu,et al.  Recognizing Facial Expressions Using a Shallow Convolutional Neural Network , 2019, IEEE Access.

[21]  Yoshua Bengio,et al.  Challenges in representation learning: A report on three machine learning contests , 2013, Neural Networks.

[22]  Ron Kimmel,et al.  A Deep Learning Perspective on the Origin of Facial Expressions , 2017, ArXiv.

[23]  Jinwen Ma,et al.  DNA-GAN: Learning Disentangled Representations from Multi-Attribute Images , 2017, ICLR.

[24]  Dinh Viet Sang,et al.  Facial expression recognition using deep convolutional neural networks , 2017, 2017 9th International Conference on Knowledge and Systems Engineering (KSE).

[25]  Changsheng Xu,et al.  Facial Expression Recognition in the Wild: A Cycle-Consistent Adversarial Attention Transfer Approach , 2018, ACM Multimedia.

[26]  Thomas S. Huang,et al.  Generative Image Inpainting with Contextual Attention , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[27]  Chengjun Liu,et al.  Gabor feature based classification using the enhanced fisher linear discriminant model for face recognition , 2002, IEEE Trans. Image Process..

[28]  Tong Zhang,et al.  A Deep Neural Network-Driven Feature Learning Method for Multi-view Facial Expression Recognition , 2016, IEEE Transactions on Multimedia.

[29]  Emad Barsoum,et al.  Training deep networks for facial expression recognition with crowd-sourced label distribution , 2016, ICMI.

[30]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[31]  Yi Li,et al.  R-FCN: Object Detection via Region-based Fully Convolutional Networks , 2016, NIPS.

[32]  Janusz Konrad,et al.  Semi-Coupled Two-Stream Fusion ConvNets for Action Recognition at Extremely Low Resolutions , 2016, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).

[33]  Gang Yu,et al.  Face Attention Network: An Effective Face Detector for the Occluded Faces , 2017, ArXiv.

[34]  Alex Graves,et al.  Recurrent Models of Visual Attention , 2014, NIPS.

[35]  Enhua Wu,et al.  Squeeze-and-Excitation Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[37]  Steven Henikoff,et al.  SIFT: predicting amino acid changes that affect protein function , 2003, Nucleic Acids Res..

[38]  Zhiyuan Li,et al.  Island Loss for Learning Discriminative Features in Facial Expression Recognition , 2017, 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018).

[39]  Yichuan Tang,et al.  Deep Learning using Support Vector Machines , 2013, ArXiv.

[40]  Jingdong Wang,et al.  Deeply-Learned Part-Aligned Representations for Person Re-identification , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[41]  Hazim Kemal Ekenel,et al.  Multi-view facial expression recognition using local appearance features , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[42]  Pieter Abbeel,et al.  InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets , 2016, NIPS.

[43]  Dimitrios Kollias,et al.  Interpretable Deep Neural Networks for Dimensional and Categorical Emotion Recognition in-the-wild , 2019, ArXiv.

[44]  Shaun J. Canavan,et al.  Deformable Synthesis Model for Emotion Recognition , 2019, 2019 14th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2019).

[45]  Marcus Liwicki,et al.  DeXpression: Deep Convolutional Neural Network for Expression Recognition , 2015, ArXiv.

[46]  Frédéric Jurie,et al.  An Occam's Razor View on Learning Audiovisual Emotion Recognition with Small Training Sets , 2018, ICMI.

[47]  Dacheng Tao,et al.  A Comprehensive Survey on Pose-Invariant Face Recognition , 2015, ACM Trans. Intell. Syst. Technol..

[48]  Ioannis Pitas,et al.  An analysis of facial expression recognition under partial facial image occlusion , 2008, Image Vis. Comput..

[49]  Qirong Mao,et al.  On Learning Disentangled Representation for Acoustic Event Detection , 2019, ACM Multimedia.

[50]  Takayuki Hamamoto,et al.  Hierarchical Group-level Emotion Recognition in the Wild , 2019, 2019 14th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2019).

[51]  Christopher Burgess,et al.  beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework , 2016, ICLR 2016.

[52]  Salvatore Sessa,et al.  A lightweight clustering–based approach to discover different emotional shades from social message streams , 2019, Int. J. Intell. Syst..

[53]  Shengcai Liao,et al.  Partial Face Recognition: Alignment-Free Approach , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[54]  Victor O. K. Li,et al.  Multi-Region Ensemble Convolutional Neural Network for Facial Expression Recognition , 2018, ICANN.

[55]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[56]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[57]  Ping Liu,et al.  Facial Expression Recognition via a Boosted Deep Belief Network , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[58]  Xiao Liu,et al.  Attention Clusters: Purely Attention Based Local Feature Integration for Video Classification , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[59]  Mirella Lapata,et al.  Long Short-Term Memory-Networks for Machine Reading , 2016, EMNLP.