论文信息 - Discriminative attention-augmented feature learning for facial expression recognition in the wild

Discriminative attention-augmented feature learning for facial expression recognition in the wild

Facial expression recognition (FER) in-the-wild is challenging due to unconstraint settings such as varying head poses, illumination, and occlusions. In addition, the performance of a FER system significantly degrades due to large intra-class variation and inter-class similarity of facial expressions in real-world scenarios. To mitigate these problems, we propose a novel approach, Discriminative Attention-augmented Feature Learning Convolution Neural Network (DAF-CNN), which learns discriminative expression-related representations for FER. Firstly, we develop a 3D attention mechanism for feature refinement which selectively focuses on attentive channel entries and salient spatial regions of a convolution neural network feature map. Moreover, a deep metric loss termed Triplet-Center (TC) loss is incorporated to further enhance the discriminative power of the deeply-learned features with an expression-similarity constraint. It simultaneously minimizes intra-class distance and maximizes inter-class distance to learn both compact and separate features. Extensive experiments have been conducted on two representative facial expression datasets (FER-2013 and SFEW 2.0) to demonstrate that DAF-CNN effectively captures discriminative feature representations and achieves competitive or even superior FER performance compared to state-of-the-art FER methods.

[1] Maja Pantic,et al. Web-based database for facial expression analysis , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[2] Yuting Zhang,et al. Learning to Disentangle Factors of Variation with Manifold Interaction , 2014, ICML.

[3] In-So Kweon,et al. CBAM: Convolutional Block Attention Module , 2018, ECCV.

[4] Holger Hoffmann,et al. Mapping discrete emotions into the dimensional space: An empirical approach , 2012, 2012 IEEE International Conference on Systems, Man, and Cybernetics (SMC).

[5] Antonio Plaza,et al. High-Order Self-Attention Network for Remote Sensing Scene Classification , 2019, IGARSS 2019 - 2019 IEEE International Geoscience and Remote Sensing Symposium.

[6] Hang Yin,et al. Learning Robust Discriminant Subspace Based on Joint L₂,ₚ- and L₂,ₛ-Norm Distance Metrics , 2020, IEEE Transactions on Neural Networks and Learning Systems.

[7] Cuong Tuan Nguyen,et al. Attention Augmented Convolutional Recurrent Network for Handwritten Japanese Text Recognition , 2020, 2020 17th International Conference on Frontiers in Handwriting Recognition (ICFHR).

[8] Mengna Zhou,et al. Facial Expression Sequence Interception Based on Feature Point Movement , 2019, 2019 IEEE 11th International Conference on Advanced Infocomm Technology (ICAIT).

[9] Andrew Zisserman,et al. Deep Face Recognition , 2015, BMVC.

[10] Guan Gui,et al. HERO: Human Emotions Recognition for Realizing Intelligent Internet of Things , 2019, IEEE Access.

[11] Jianfei Cai,et al. Facial Motion Prior Networks for Facial Expression Recognition , 2019, 2019 IEEE Visual Communications and Image Processing (VCIP).

[12] Takeo Kanade,et al. The Extended Cohn-Kanade Dataset (CK+): A complete dataset for action unit and emotion-specified expression , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[13] Zhiyuan Li,et al. Island Loss for Learning Discriminative Features in Facial Expression Recognition , 2017, 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018).

[14] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[16] Tamás D. Gedeon,et al. Video and Image based Emotion Recognition Challenges in the Wild: EmotiW 2015 , 2015, ICMI.

[17] Yang Yang,et al. Cross-domain facial expression recognition via an intra-category common feature and inter-category Distinction feature fusion network , 2019, Neurocomputing.

[18] Junping Du,et al. Reliable Crowdsourcing and Deep Locality-Preserving Learning for Expression Recognition in the Wild , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19] LuceySimon,et al. Collecting Large, Richly Annotated Facial-Expression Databases from Movies , 2012 .

[20] Graham W. Taylor,et al. Multi-task Learning of Facial Landmarks and Expression , 2014, 2014 Canadian Conference on Computer and Robot Vision.

[21] Shengwei Zhao,et al. Occluded Face Recognition in the Wild by Identity-Diversity Inpainting , 2020, IEEE Transactions on Circuits and Systems for Video Technology.

[22] Michael Goh Kah Ong,et al. Facial Expression Recognition Using a Hybrid CNN-SIFT Aggregator , 2017, MIWAI.

[23] Pascal Vincent,et al. Disentangling Factors of Variation for Facial Expression Recognition , 2012, ECCV.

[24] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[25] Zechao Li,et al. Nonpeaked Discriminant Analysis for Data Representation , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[26] Stefan Winkler,et al. Deep Learning for Emotion Recognition on Small Datasets using Transfer Learning , 2015, ICMI.

[27] Yoshua Bengio,et al. Challenges in Representation Learning: A Report on Three Machine Learning Contests , 2013, ICONIP.

[28] Quoc V. Le,et al. Attention Augmented Convolutional Networks , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[29] Yu Qiao,et al. A Discriminative Feature Learning Approach for Deep Face Recognition , 2016, ECCV.

[30] Alexander J. Smola,et al. Stacked Attention Networks for Image Question Answering , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31] Xiuzhuang Zhou,et al. Facial Depression Recognition by Deep Joint Label Distribution and Metric Learning , 2022, IEEE Transactions on Affective Computing.

[32] Bill Triggs,et al. Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[33] Ping Liu,et al. Identity-Aware Convolutional Neural Network for Facial Expression Recognition , 2017, 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017).

[34] Matti Pietikäinen,et al. Dynamic Texture Recognition Using Local Binary Patterns with an Application to Facial Expressions , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35] James Philbin,et al. FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36] Daijin Kim,et al. Spatio-Temporal Slowfast Self-Attention Network For Action Recognition , 2020, 2020 IEEE International Conference on Image Processing (ICIP).

[37] Konstantinos Demertzis,et al. Large-Scale Geospatial Data Analysis: Geographic Object-Based Scene Classification in Remote Sensing Images by GIS and Deep Residual Learning , 2020, EANN.

[38] Xiaogang Wang,et al. Residual Attention Network for Image Classification , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39] Guodong Guo,et al. Visually Interpretable Representation Learning for Depression Recognition from Facial Images , 2020, IEEE Transactions on Affective Computing.

[40] Zhengyou Zhang,et al. Comparison between geometry-based and Gabor-wavelets-based facial expression recognition using multi-layer perceptron , 1998, Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition.

[41] Christopher D. Manning,et al. Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[42] Cha Zhang,et al. Image based Static Facial Expression Recognition with Multiple Deep Network Learning , 2015, ICMI.

[43] Tamás D. Gedeon,et al. Collecting Large, Richly Annotated Facial-Expression Databases from Movies , 2012, IEEE MultiMedia.

[44] Konstantinos Demertzis,et al. GeoAI: A Model-Agnostic Meta-Ensemble Zero-Shot Learning Method for Hyperspectral Image Analysis and Classification , 2020, Algorithms.

[45] Di Huang,et al. Discriminative Attention-based Convolutional Neural Network for 3D Facial Expression Recognition , 2019, 2019 14th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2019).

[46] Tat-Seng Chua,et al. SCA-CNN: Spatial and Channel-Wise Attention in Convolutional Networks for Image Captioning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47] Soo-Young Lee,et al. Hierarchical committee of deep convolutional neural networks for robust facial expression recognition , 2016, Journal on Multimodal User Interfaces.

[48] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[49] Jie Shao,et al. Three convolutional neural network models for facial expression recognition in the wild , 2019, Neurocomputing.

[50] Jian Yang,et al. L1-Norm Distance Linear Discriminant Analysis Based on an Effective Iterative Algorithm , 2018, IEEE Transactions on Circuits and Systems for Video Technology.

[51] Tal Hassner,et al. Emotion Recognition in the Wild via Convolutional Neural Networks and Mapped Binary Patterns , 2015, ICMI.

[52] Song Bai,et al. Triplet-Center Loss for Multi-view 3D Object Retrieval , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[53] Mohammad H. Mahoor,et al. Going deeper in facial expression recognition using deep neural networks , 2015, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).

[54] Geoffrey E. Hinton,et al. Visualizing Data using t-SNE , 2008 .

[55] Jia Deng,et al. Stacked Hourglass Networks for Human Pose Estimation , 2016, ECCV.