Video Emotion Recognition Using Local Enhanced Motion History Image and CNN-RNN Networks

This paper focus on the issue of recognition of facial expressions in video sequences and propose a local-with-global method, which is based on local enhanced motion history image and CNN-RNN networks. On the one hand, traditional motion history image method is improved by using detected human facial landmarks as attention areas to boost local value in difference image calculation, so that the action of crucial facial unit can be captured effectively, then the generated LEMHI is fed into a CNN network for categorization. On the other hand, a CNN-LSTM model is used as an global feature extractor and classifier for video emotion recognition. Finally, a random search weighted summation strategy is selected as our late-fusion fashion to final predication. Experiments on AFEW, CK+ and MMI datasets using subject-independent validation scheme demonstrate that the integrated framework achieves a better performance than state-of-arts methods.

[1]  Shiguang Shan,et al.  Deeply Learning Deformable Facial Action Parts Model for Dynamic Expression Analysis , 2014, ACCV.

[2]  Shiguang Shan,et al.  Learning Expressionlets on Spatio-temporal Manifold for Dynamic Facial Expression Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Yuanliu Liu,et al.  Video-based emotion recognition using CNN-RNN and C3D hybrid networks , 2016, ICMI.

[4]  Maja Pantic,et al.  A Dynamic Texture-Based Approach to Recognition of Facial Actions and Their Temporal Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Yurong Chen,et al.  Capturing AU-Aware Facial Features and Their Latent Relations for Emotion Recognition in the Wild , 2015, ICMI.

[6]  Rama Chellappa,et al.  Structure-Preserving Sparse Decomposition for Facial Expression Analysis , 2014, IEEE Transactions on Image Processing.

[7]  B. Radig,et al.  Cross-database evaluation for facial expression recognition , 2014, Pattern Recognition and Image Analysis.

[8]  Y. LeCun,et al.  Learning methods for generic object recognition with invariance to pose and lighting , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[9]  Ghassan Al-Regib,et al.  TS-LSTM and Temporal-Inception: Exploiting Spatiotemporal Dynamics for Activity Recognition , 2017, Signal Process. Image Commun..

[10]  Shaogang Gong,et al.  Facial expression recognition based on Local Binary Patterns: A comprehensive study , 2009, Image Vis. Comput..

[11]  Yong Man Ro,et al.  Intra-Class Variation Reduction Using Training Expression Images for Sparse Representation Based Facial Expression Recognition , 2014, IEEE Transactions on Affective Computing.

[12]  Mohammad H. Mahoor,et al.  Facial Expression Recognition Using Enhanced Deep 3D Convolutional Neural Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[13]  Nam Ik Cho,et al.  Feeding Hand-Crafted Features for Enhancing the Performance of Convolutional Neural Networks , 2018, ArXiv.

[14]  Stefan Carlsson,et al.  CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[15]  Tardi Tjahjadi,et al.  A dynamic framework based on local Zernike moment and motion history image for facial expression recognition , 2017, Pattern Recognit..