Occlusion Aware Facial Expression Recognition Using CNN With Attention Mechanism

Facial expression recognition in the wild is challenging due to various unconstrained conditions. Although existing facial expression classifiers have been almost perfect on analyzing constrained frontal faces, they fail to perform well on partially occluded faces that are common in the wild. In this paper, we propose a convolution neutral network (CNN) with attention mechanism (ACNN) that can perceive the occlusion regions of the face and focus on the most discriminative un-occluded regions. ACNN is an end-to-end learning framework. It combines the multiple representations from facial regions of interest (ROIs). Each representation is weighed via a proposed gate unit that computes an adaptive weight from the region itself according to the unobstructedness and importance. Considering different RoIs, we introduce two versions of ACNN: patch-based ACNN (pACNN) and global–local-based ACNN (gACNN). pACNN only pays attention to local facial patches. gACNN integrates local representations at patch-level with global representation at image-level. The proposed ACNNs are evaluated on both real and synthetic occlusions, including a self-collected facial expression dataset with real-world occlusions, the two largest in-the-wild facial expression datasets (RAF-DB and AffectNet) and their modifications with synthesized facial occlusions. Experimental results show that ACNNs improve the recognition accuracy on both the non-occluded faces and occluded faces. Visualization results demonstrate that, compared with the CNN without Gate Unit, ACNNs are capable of shifting the attention from the occluded patches to other related but unobstructed ones. ACNNs also outperform other state-of-the-art methods on several widely used in-the-lab facial expression datasets under the cross-dataset evaluation protocol.

[1]  Akihiro Sugimoto,et al.  Facial expression recognition by re-ranking with global and local generic features , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[2]  Qingshan Liu,et al.  Learning active facial patches for expression analysis , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Ran He,et al.  Learning Disentangling and Fusing Networks for Face Completion Under Structured Occlusions , 2017, Pattern Recognit..

[4]  Tao Mei,et al.  Learning Multi-attention Convolutional Neural Network for Fine-Grained Image Recognition , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[5]  Abhishek Das,et al.  Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[6]  Chung-Hsien Wu,et al.  Facial action unit prediction under partial occlusion based on Error Weighted Cross-Correlation Model , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[7]  Aleix M. Martínez,et al.  Recognizing Imprecisely Localized, Partially Occluded, and Expression Variant Faces from a Single Sample per Class , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Abdenour Hadid,et al.  Improving the recognition of faces occluded by facial accessories , 2011, Face and Gesture 2011.

[9]  Mel Slater,et al.  Reconstruction and Recognition of Occluded Facial Expressions Using PCA , 2007, ACII.

[10]  M. Mahoor,et al.  Facial expression recognition using lp-norm MKL multiclass-SVM , 2015 .

[11]  Tamás D. Gedeon,et al.  Static facial expression analysis in tough conditions: Data, evaluation protocol and benchmark , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[12]  Hatice Gunes,et al.  Automatic Temporal Segment Detection and Affect Recognition From Face and Body Display , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[13]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[14]  Qiang Ji,et al.  Active and dynamic information fusion for facial expression understanding from image sequences , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Àgata Lapedriza,et al.  Emotion Recognition in Context , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Takeo Kanade,et al.  The Extended Cohn-Kanade Dataset (CK+): A complete dataset for action unit and emotion-specified expression , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[17]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[18]  Vinod Chandran,et al.  Random Gabor based templates for facial expression recognition in images with facial occlusion , 2014, Neurocomputing.

[19]  Shiguang Shan,et al.  Patch-Gated CNN for Occlusion-aware Facial Expression Recognition , 2018, 2018 24th International Conference on Pattern Recognition (ICPR).

[20]  Zheng Li,et al.  Robust facial expression recognition based on RPCA and AdaBoost , 2009, 2009 10th Workshop on Image Analysis for Multimedia Interactive Services.

[21]  Jingdong Wang,et al.  Deeply-Learned Part-Aligned Representations for Person Re-identification , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[22]  Marios Savvides,et al.  DeepGender: Occlusion and Low Resolution Robust Facial Gender Classification via Progressively Trained Convolutional Neural Networks with Attention , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[23]  Vinod Chandran,et al.  Facial Expression Analysis under Partial Occlusion , 2018, ACM Comput. Surv..

[24]  B. Radig,et al.  Cross-database evaluation for facial expression recognition , 2014, Pattern Recognition and Image Analysis.

[25]  Matti Pietikäinen,et al.  Facial expression recognition from near-infrared videos , 2011, Image Vis. Comput..

[26]  C. Koch,et al.  Computational modelling of visual attention , 2001, Nature Reviews Neuroscience.

[27]  Surendra Ranganath,et al.  Tracking facial features under occlusions and recognizing facial expressions in sign language , 2008, 2008 8th IEEE International Conference on Automatic Face & Gesture Recognition.

[28]  Yan Wang,et al.  EmotioNet Challenge: Recognition of facial expressions of emotion in the wild , 2017, ArXiv.

[29]  Mohammad H. Mahoor,et al.  Going deeper in facial expression recognition using deep neural networks , 2015, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).

[30]  Yan Zhang,et al.  Facial Expression Recognition under Partial Occlusion Based on Gabor Multi-orientation Features Fusion and Local Gabor Binary Pattern Histogram Sequence , 2013, 2013 Ninth International Conference on Intelligent Information Hiding and Multimedia Signal Processing.

[31]  Nanning Zheng,et al.  Person Re-identification by Multi-Channel Parts-Based CNN with Improved Triplet Loss Function , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Michael Lindenbaum,et al.  Increasing CNN Robustness to Occlusions by Reducing Filter Support , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[33]  Séverine Dubuisson,et al.  Confidence-Weighted Local Expression Predictions for Occlusion Handling in Expression Recognition and Action Unit Detection , 2016, International Journal of Computer Vision.

[34]  Tamás D. Gedeon,et al.  Collecting Large, Richly Annotated Facial-Expression Databases from Movies , 2012, IEEE MultiMedia.

[35]  Mahmoud Afifi,et al.  AFIF4: Deep Gender Classification based on AdaBoost-based Fusion of Isolated Facial Features and Foggy Faces , 2017, J. Vis. Commun. Image Represent..

[36]  Allen Y. Yang,et al.  Robust Face Recognition via Sparse Representation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37]  Thomas S. Huang,et al.  Generative Image Inpainting with Contextual Attention , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[38]  Junping Du,et al.  Reliable Crowdsourcing and Deep Locality-Preserving Learning for Expression Recognition in the Wild , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Zhigang Zhu,et al.  Action Unit Detection with Region Adaptation, Multi-labeling Learning and Optimal Temporal Fusing , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[41]  Qionghai Dai,et al.  Partially occluded face completion and recognition , 2009, 2009 16th IEEE International Conference on Image Processing (ICIP).

[42]  Kewei Tu,et al.  Structured Attentions for Visual Question Answering , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[43]  Geoffrey E. Hinton,et al.  On deep generative models with applications to recognition , 2011, CVPR 2011.

[44]  Majid Nili Ahmadabadi,et al.  Attention control with reinforcement learning for face recognition under partial occlusion , 2011, Machine Vision and Applications.

[45]  Mohammad H. Mahoor,et al.  AffectNet: A Database for Facial Expression, Valence, and Arousal Computing in the Wild , 2017, IEEE Transactions on Affective Computing.

[46]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Matti Pietikäinen,et al.  Towards a dynamic expression recognition system under facial occlusion , 2012, Pattern Recognit. Lett..

[48]  Sheng Tang,et al.  Image Caption with Global-Local Attention , 2017, AAAI.

[49]  Maja Pantic,et al.  Web-based database for facial expression analysis , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[50]  Ming-Hsuan Yang,et al.  Generative Face Completion , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[51]  Hiroshi Ishikawa,et al.  Globally and locally consistent image completion , 2017, ACM Trans. Graph..

[52]  Shiguang Shan,et al.  Occlusion-Free Face Alignment: Deep Regression Networks Coupled with De-Corrupt AutoEncoders , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[53]  Minh N. Do,et al.  Semantic Image Inpainting with Deep Generative Models , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[54]  Ioannis Pitas,et al.  An analysis of facial expression recognition under partial facial image occlusion , 2008, Image Vis. Comput..

[55]  Alexei A. Efros,et al.  Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).