Discriminative attention-augmented feature learning for facial expression recognition in the wild

Facial expression recognition (FER) in-the-wild is challenging due to unconstraint settings such as varying head poses, illumination, and occlusions. In addition, the performance of a FER system significantly degrades due to large intra-class variation and inter-class similarity of facial expressions in real-world scenarios. To mitigate these problems, we propose a novel approach, Discriminative Attention-augmented Feature Learning Convolution Neural Network (DAF-CNN), which learns discriminative expression-related representations for FER. Firstly, we develop a 3D attention mechanism for feature refinement which selectively focuses on attentive channel entries and salient spatial regions of a convolution neural network feature map. Moreover, a deep metric loss termed Triplet-Center (TC) loss is incorporated to further enhance the discriminative power of the deeply-learned features with an expression-similarity constraint. It simultaneously minimizes intra-class distance and maximizes inter-class distance to learn both compact and separate features. Extensive experiments have been conducted on two representative facial expression datasets (FER-2013 and SFEW 2.0) to demonstrate that DAF-CNN effectively captures discriminative feature representations and achieves competitive or even superior FER performance compared to state-of-the-art FER methods.

[1]  Maja Pantic,et al.  Web-based database for facial expression analysis , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[2]  Yuting Zhang,et al.  Learning to Disentangle Factors of Variation with Manifold Interaction , 2014, ICML.

[3]  In-So Kweon,et al.  CBAM: Convolutional Block Attention Module , 2018, ECCV.

[4]  Holger Hoffmann,et al.  Mapping discrete emotions into the dimensional space: An empirical approach , 2012, 2012 IEEE International Conference on Systems, Man, and Cybernetics (SMC).

[5]  Antonio Plaza,et al.  High-Order Self-Attention Network for Remote Sensing Scene Classification , 2019, IGARSS 2019 - 2019 IEEE International Geoscience and Remote Sensing Symposium.

[6]  Hang Yin,et al.  Learning Robust Discriminant Subspace Based on Joint L₂,ₚ- and L₂,ₛ-Norm Distance Metrics , 2020, IEEE Transactions on Neural Networks and Learning Systems.

[7]  Cuong Tuan Nguyen,et al.  Attention Augmented Convolutional Recurrent Network for Handwritten Japanese Text Recognition , 2020, 2020 17th International Conference on Frontiers in Handwriting Recognition (ICFHR).

[8]  Mengna Zhou,et al.  Facial Expression Sequence Interception Based on Feature Point Movement , 2019, 2019 IEEE 11th International Conference on Advanced Infocomm Technology (ICAIT).

[9]  Andrew Zisserman,et al.  Deep Face Recognition , 2015, BMVC.

[10]  Guan Gui,et al.  HERO: Human Emotions Recognition for Realizing Intelligent Internet of Things , 2019, IEEE Access.

[11]  Jianfei Cai,et al.  Facial Motion Prior Networks for Facial Expression Recognition , 2019, 2019 IEEE Visual Communications and Image Processing (VCIP).

[12]  Takeo Kanade,et al.  The Extended Cohn-Kanade Dataset (CK+): A complete dataset for action unit and emotion-specified expression , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[13]  Zhiyuan Li,et al.  Island Loss for Learning Discriminative Features in Facial Expression Recognition , 2017, 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018).

[14]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[16]  Tamás D. Gedeon,et al.  Video and Image based Emotion Recognition Challenges in the Wild: EmotiW 2015 , 2015, ICMI.

[17]  Yang Yang,et al.  Cross-domain facial expression recognition via an intra-category common feature and inter-category Distinction feature fusion network , 2019, Neurocomputing.

[18]  Junping Du,et al.  Reliable Crowdsourcing and Deep Locality-Preserving Learning for Expression Recognition in the Wild , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  LuceySimon,et al.  Collecting Large, Richly Annotated Facial-Expression Databases from Movies , 2012 .

[20]  Graham W. Taylor,et al.  Multi-task Learning of Facial Landmarks and Expression , 2014, 2014 Canadian Conference on Computer and Robot Vision.

[21]  Shengwei Zhao,et al.  Occluded Face Recognition in the Wild by Identity-Diversity Inpainting , 2020, IEEE Transactions on Circuits and Systems for Video Technology.

[22]  Michael Goh Kah Ong,et al.  Facial Expression Recognition Using a Hybrid CNN-SIFT Aggregator , 2017, MIWAI.

[23]  Pascal Vincent,et al.  Disentangling Factors of Variation for Facial Expression Recognition , 2012, ECCV.

[24]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[25]  Zechao Li,et al.  Nonpeaked Discriminant Analysis for Data Representation , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[26]  Stefan Winkler,et al.  Deep Learning for Emotion Recognition on Small Datasets using Transfer Learning , 2015, ICMI.

[27]  Yoshua Bengio,et al.  Challenges in Representation Learning: A Report on Three Machine Learning Contests , 2013, ICONIP.

[28]  Quoc V. Le,et al.  Attention Augmented Convolutional Networks , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[29]  Yu Qiao,et al.  A Discriminative Feature Learning Approach for Deep Face Recognition , 2016, ECCV.

[30]  Alexander J. Smola,et al.  Stacked Attention Networks for Image Question Answering , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Xiuzhuang Zhou,et al.  Facial Depression Recognition by Deep Joint Label Distribution and Metric Learning , 2022, IEEE Transactions on Affective Computing.

[32]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[33]  Ping Liu,et al.  Identity-Aware Convolutional Neural Network for Facial Expression Recognition , 2017, 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017).

[34]  Matti Pietikäinen,et al.  Dynamic Texture Recognition Using Local Binary Patterns with an Application to Facial Expressions , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Daijin Kim,et al.  Spatio-Temporal Slowfast Self-Attention Network For Action Recognition , 2020, 2020 IEEE International Conference on Image Processing (ICIP).

[37]  Konstantinos Demertzis,et al.  Large-Scale Geospatial Data Analysis: Geographic Object-Based Scene Classification in Remote Sensing Images by GIS and Deep Residual Learning , 2020, EANN.

[38]  Xiaogang Wang,et al.  Residual Attention Network for Image Classification , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Guodong Guo,et al.  Visually Interpretable Representation Learning for Depression Recognition from Facial Images , 2020, IEEE Transactions on Affective Computing.

[40]  Zhengyou Zhang,et al.  Comparison between geometry-based and Gabor-wavelets-based facial expression recognition using multi-layer perceptron , 1998, Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition.

[41]  Christopher D. Manning,et al.  Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[42]  Cha Zhang,et al.  Image based Static Facial Expression Recognition with Multiple Deep Network Learning , 2015, ICMI.

[43]  Tamás D. Gedeon,et al.  Collecting Large, Richly Annotated Facial-Expression Databases from Movies , 2012, IEEE MultiMedia.

[44]  Konstantinos Demertzis,et al.  GeoAI: A Model-Agnostic Meta-Ensemble Zero-Shot Learning Method for Hyperspectral Image Analysis and Classification , 2020, Algorithms.

[45]  Di Huang,et al.  Discriminative Attention-based Convolutional Neural Network for 3D Facial Expression Recognition , 2019, 2019 14th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2019).

[46]  Tat-Seng Chua,et al.  SCA-CNN: Spatial and Channel-Wise Attention in Convolutional Networks for Image Captioning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Soo-Young Lee,et al.  Hierarchical committee of deep convolutional neural networks for robust facial expression recognition , 2016, Journal on Multimodal User Interfaces.

[48]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[49]  Jie Shao,et al.  Three convolutional neural network models for facial expression recognition in the wild , 2019, Neurocomputing.

[50]  Jian Yang,et al.  L1-Norm Distance Linear Discriminant Analysis Based on an Effective Iterative Algorithm , 2018, IEEE Transactions on Circuits and Systems for Video Technology.

[51]  Tal Hassner,et al.  Emotion Recognition in the Wild via Convolutional Neural Networks and Mapped Binary Patterns , 2015, ICMI.

[52]  Song Bai,et al.  Triplet-Center Loss for Multi-view 3D Object Retrieval , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[53]  Mohammad H. Mahoor,et al.  Going deeper in facial expression recognition using deep neural networks , 2015, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).

[54]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[55]  Jia Deng,et al.  Stacked Hourglass Networks for Human Pose Estimation , 2016, ECCV.