Multi-attention based Deep Neural Network with hybrid features for Dynamic Sequential Facial Expression Recognition

Abstract In interpersonal communication, the expression is an import way to express one’s emotions. In order to make computers understand facial expressions like human beings, a large number of researchers have put a lot of time and energy into it. But for now, most of the work of dynamic sequence facial expression recognition fails to make full use of the combined advantages of shallow features(prior knowledge) and depth features(high-level semantic). Therefore, this paper implements a dynamic sequence facial expression recognition system that integrates shallow features and deep features with the attention mechanism. In order to extract the shallow features, an Attention Shallow Model(ASModel) is proposed by using the relative position of facial landmarks and the texture characteristics of the local area of the face to describe the Action Units of the Facial Action Coding System. And with the advantage of the deep convolutional neural network in expressing high-level features, a Attention Deep Model(ADModel) is also designed to extract deep features on sequence facial images. Finally, the ASModel and the ADModel are integrated to a Multi-attention Shallow and Deep Model(MSDModel) to complete the dynamic sequence facial expression recognition. There are three kinds of attention mechanism introduced, such as Self-Attention(SA), Weight-Attention(WA), and Convolution-Attention(CA). We verify our dynamic expression recognition system on three publicly available databases include CK+, MMI, and Oulu-CASIA and get superior performance than other state-of-art results.

[1]  Mubarak Shah,et al.  A 3-dimensional sift descriptor and its application to action recognition , 2007, ACM Multimedia.

[2]  Yong Man Ro,et al.  Multi-Objective Based Spatio-Temporal Feature Representation Learning Robust to Expression Intensity Variations for Facial Expression Recognition , 2019, IEEE Transactions on Affective Computing.

[3]  Qiang Ji,et al.  A Novel Dynamic Model Capturing Spatial and Temporal Patterns for Facial Expression Analysis , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Junmo Kim,et al.  Joint Fine-Tuning in Deep Neural Networks for Facial Expression Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[5]  Vladimir Vapnik,et al.  Support-vector networks , 2004, Machine Learning.

[6]  M. Sharif,et al.  Hexagonal scale invariant feature transform (H-SIFT) for facial feature extraction , 2015 .

[7]  Hasan Demirel,et al.  Facial expression recognition based on discriminative scale invariant feature transform , 2010 .

[8]  N. Altman An Introduction to Kernel and Nearest-Neighbor Nonparametric Regression , 1992 .

[9]  Huagang Liang,et al.  Deep convolutional BiLSTM fusion network for facial expression recognition , 2019, The Visual Computer.

[10]  Takeo Kanade,et al.  Recognizing Action Units for Facial Expression Analysis , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[12]  Qiang Ji,et al.  Active and dynamic information fusion for facial expression understanding from image sequences , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  P. Ekman,et al.  Constants across cultures in the face and emotion. , 1971, Journal of personality and social psychology.

[14]  Qingshan Liu,et al.  Spatio-temporal convolutional features with nested LSTM for facial expression recognition , 2018, Neurocomputing.

[15]  Stefan Winkler,et al.  Deep Learning for Emotion Recognition on Small Datasets using Transfer Learning , 2015, ICMI.

[16]  Zhiwei Zhu,et al.  Dynamic Facial Expression Analysis and Synthesis With MPEG-4 Facial Animation Parameters , 2008, IEEE Transactions on Circuits and Systems for Video Technology.

[17]  Min Chen,et al.  Facial expression recognition in dynamic sequences: An integrated approach , 2014, Pattern Recognit..

[18]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[20]  Cheng Lu,et al.  Multiple Spatio-temporal Feature Learning for Video-based Emotion Recognition in the Wild , 2018, ICMI.

[21]  Wei Liu,et al.  Emotion Recognition Using Multimodal Deep Learning , 2016, ICONIP.

[22]  Xiaogang Wang,et al.  Deep Convolutional Network Cascade for Facial Point Detection , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[24]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[25]  Kuldip K. Paliwal,et al.  Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..

[26]  Shaogang Gong,et al.  Robust facial expression recognition using local binary patterns , 2005, IEEE International Conference on Image Processing 2005.

[27]  Min Hu,et al.  Video facial emotion recognition based on local enhanced motion history image and CNN-CTSLSTM networks , 2019, J. Vis. Commun. Image Represent..

[28]  Shaogang Gong,et al.  Facial expression recognition based on Local Binary Patterns: A comprehensive study , 2009, Image Vis. Comput..

[29]  Yuan Luo,et al.  Facial expression recognition based on fusion feature of PCA and LBP with SVM , 2013 .

[30]  Jian Zhang,et al.  Learning deep facial expression features from image and optical flow sequences using 3D CNN , 2018, The Visual Computer.

[31]  Qiang Ji,et al.  Facial Action Unit Recognition by Exploiting Their Dynamic and Semantic Relationships , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  Shiguang Shan,et al.  Learning Expressionlets on Spatio-temporal Manifold for Dynamic Facial Expression Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[33]  Diyi Yang,et al.  Hierarchical Attention Networks for Document Classification , 2016, NAACL.

[34]  Matti Pietikäinen,et al.  Face Description with Local Binary Patterns: Application to Face Recognition , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  Gang Hua,et al.  A convolutional neural network cascade for face detection , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Matti Pietikäinen,et al.  Dynamic Texture Recognition Using Local Binary Patterns with an Application to Facial Expressions , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.