MERTA: micro-expression recognition with ternary attentions

Micro-expression is a spontaneous and uncontrollable way to convey emotions. It contains abundant psychological information, whose recognition has significant importance in various fields. In recent years, with the rapid development of computer vision, the research of facial expression tends to be more mature while the research of micro-expression remains a hot yet challenging topic. The main difficulties of recognizing micro-expression lay on the discriminative feature extraction process due to the extremely short-term and subtlety of micro-expression. To deal with this problem, this paper proposes a deep learning model to efficiently extract discriminative features. Our model is based on three VGGNets and one Long Short-Term Memory (LSTM). Three VGGNets are used to extract static and motive information where three types of attention mechanism are jointly integrated for more discriminative visual representations. Then, the spatial features of a micro-expression sequence are sequentially fed into an LSTM to extract spatio-temporal features and predict the micro-expression category. Our algorithm is carried out on the benchmark micro-expression dataset CASME II. Its efficiency is demonstrated by extensive ablation analysis and state-of-the-art algorithms.

[1]  Guoying Zhao,et al.  CASME II: An Improved Spontaneous Micro-Expression Database and the Baseline Evaluation , 2014, PloS one.

[2]  Davis E. King,et al.  Dlib-ml: A Machine Learning Toolkit , 2009, J. Mach. Learn. Res..

[3]  Javier Sánchez Pérez,et al.  TV-L1 Optical Flow Estimation , 2013, Image Process. Line.

[4]  Xiaogang Wang,et al.  Context Encoding for Semantic Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[5]  Ning Xu,et al.  Scene graph captioner: Image captioning based on structural visual representation , 2019, J. Vis. Commun. Image Represent..

[6]  P. Ekman,et al.  The ability to detect deceit generalizes across different types of high-stake lies. , 1997, Journal of personality and social psychology.

[7]  Alex Graves,et al.  Recurrent Models of Visual Attention , 2014, NIPS.

[8]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[9]  Nicholas Costen,et al.  SAMM: A Spontaneous Micro-Facial Movement Dataset , 2018, IEEE Transactions on Affective Computing.

[10]  Guoying Zhao,et al.  Selective deep features for micro-expression recognition , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[11]  Zhiyong Gao,et al.  Principal components analysis-based visual saliency detection , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[12]  Jing Liu,et al.  Spatiotemporal Symmetric Convolutional Neural Network for Video Bit-Depth Enhancement , 2019, IEEE Transactions on Multimedia.

[13]  Xiaoyun Zhang,et al.  Depth-Aware Video Frame Interpolation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Alexander J. Smola,et al.  Stacked Attention Networks for Image Question Answering , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Matti Pietikäinen,et al.  A Spontaneous Micro-expression Database: Inducement, collection and baseline , 2013, 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[16]  Dmitry B. Goldgof,et al.  Macro- and micro-expression spotting in long videos using spatio-temporal strain , 2011, Face and Gesture 2011.

[17]  Feng Xu,et al.  Microexpression Identification and Categorization Using a Facial Dynamics Map , 2017, IEEE Transactions on Affective Computing.

[18]  Gabriele Facciolo,et al.  TV-L 1 Optical Flow Estimation , 2013 .

[19]  Enhua Wu,et al.  Squeeze-and-Excitation Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Yong Man Ro,et al.  Subtle Facial Expression Recognition Using Adaptive Magnification of Discriminative Facial Motion , 2015, ACM Multimedia.

[21]  Guangtao Zhai,et al.  Visual saliency model based on minimum description length , 2016, 2016 IEEE International Symposium on Circuits and Systems (ISCAS).

[22]  Shuang Ma,et al.  A-Lamp: Adaptive Layout-Aware Multi-patch Deep Convolutional Neural Network for Photo Aesthetic Assessment , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  D. Matsumoto,et al.  Evidence for training the ability to read microexpressions of emotion , 2011 .

[24]  Min Peng,et al.  From Macro to Micro Expression Recognition: Deep Learning on Small Datasets Using Transfer Learning , 2018, 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018).

[25]  Radhika M. Pai,et al.  Combining temporal interpolation and DCNN for faster recognition of micro-expressions in video sequences , 2016, 2016 International Conference on Advances in Computing, Communications and Informatics (ICACCI).

[26]  Paul Ekman,et al.  A Few Can Catch a Liar , 1999 .

[27]  Yuichi Ohta,et al.  Facial micro-expressions recognition using high speed camera and 3D-gradient descriptor , 2009, ICDP.

[28]  Tat-Seng Chua,et al.  SCA-CNN: Spatial and Channel-Wise Attention in Convolutional Networks for Image Captioning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Peter Bull,et al.  Detecting Deception from Emotional and Unemotional Cues , 2009 .

[30]  Dong Xu,et al.  Deep Kalman Filtering Network for Video Compression Artifact Reduction , 2018, ECCV.

[31]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[32]  Xiaolan Fu,et al.  CAS(ME)$^2$ : A Database for Spontaneous Macro-Expression and Micro-Expression Spotting and Recognition , 2018, IEEE Transactions on Affective Computing.

[33]  Radu Danescu,et al.  Real-time micro-expression detection from high speed cameras , 2017, 2017 13th IEEE International Conference on Intelligent Computer Communication and Processing (ICCP).

[34]  Huai-Qian Khor,et al.  Enriched Long-Term Recurrent Convolutional Network for Facial Micro-Expression Recognition , 2018, 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018).

[35]  Trevor Darrell,et al.  Long-term recurrent convolutional networks for visual recognition and description , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Christof Koch,et al.  A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 2009 .

[37]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[38]  P. Ekman Darwin, Deception, and Facial Expression , 2003, Annals of the New York Academy of Sciences.

[39]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[40]  Yong Man Ro,et al.  Micro-Expression Recognition with Expression-State Constrained Spatio-Temporal Feature Representations , 2016, ACM Multimedia.

[41]  Jing Liu,et al.  BE-CALF: Bit-Depth Enhancement by Concatenating All Level Features of DNN , 2019, IEEE Transactions on Image Processing.

[42]  Wen-Jing Yan,et al.  How Fast are the Leaked Facial Expressions: The Duration of Micro-Expressions , 2013 .

[43]  Gang Sun,et al.  Squeeze-and-Excitation Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[44]  Qi Wu,et al.  CASME database: A dataset of spontaneous micro-expressions collected from neutralized faces , 2013, 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[45]  Jun Yu,et al.  Spontaneous facial micro-expression detection based on deep learning , 2016, 2016 IEEE 13th International Conference on Signal Processing (ICSP).

[46]  Matti Pietikäinen,et al.  Recognising spontaneous facial micro-expressions , 2011, 2011 International Conference on Computer Vision.

[47]  John See,et al.  LBP with Six Intersection Points: Reducing Redundant Information in LBP-TOP for Micro-expression Recognition , 2014, ACCV.