Context-Aware Emotion Recognition Networks

Traditional techniques for emotion recognition have focused on the facial expression analysis only, thus providing limited ability to encode context that comprehensively represents the emotional responses. We present deep networks for context-aware emotion recognition, called CAER-Net, that exploit not only human facial expression but also context information in a joint and boosting manner. The key idea is to hide human faces in a visual scene and seek other contexts based on an attention mechanism. Our networks consist of two sub-networks, including two-stream encoding networks to separately extract the features of face and context regions, and adaptive fusion networks to fuse such features in an adaptive fashion. We also introduce a novel benchmark for context-aware emotion recognition, called CAER, that is appropriate than existing benchmarks both qualitatively and quantitatively. On several benchmarks, CAER-Net proves the effect of context for emotion recognition. Our dataset is available at http://caer-dataset.github.io.

[1]  Yuanliu Liu,et al.  Video-based emotion recognition using CNN-RNN and C3D hybrid networks , 2016, ICMI.

[2]  Qingshan Liu,et al.  Learning active facial patches for expression analysis , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Davis E. King,et al.  Dlib-ml: A Machine Learning Toolkit , 2009, J. Mach. Learn. Res..

[4]  Tamás D. Gedeon,et al.  Static facial expression analysis in tough conditions: Data, evaluation protocol and benchmark , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[5]  Gayler, and David Hawking. Similarity-Aware Indexing for , 2009 .

[6]  Fei-Fei Li,et al.  Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Chen Chen,et al.  Emotion in Context: Deep Semantic Feature Fusion for Video Emotion Recognition , 2016, ACM Multimedia.

[8]  Shiguang Shan,et al.  Occlusion Aware Facial Expression Recognition Using CNN With Attention Mechanism , 2019, IEEE Transactions on Image Processing.

[9]  Ruslan Salakhutdinov,et al.  Action Recognition using Visual Attention , 2015, NIPS 2015.

[10]  Junping Du,et al.  Reliable Crowdsourcing and Deep Locality-Preserving Learning for Expression Recognition in the Wild , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Shaogang Gong,et al.  Facial expression recognition based on Local Binary Patterns: A comprehensive study , 2009, Image Vis. Comput..

[12]  Maja Pantic,et al.  Web-based database for facial expression analysis , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[13]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[14]  Aleix M. Martínez,et al.  EmotioNet: An Accurate, Real-Time Algorithm for the Automatic Annotation of a Million Facial Expressions in the Wild , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  In-So Kweon,et al.  CBAM: Convolutional Block Attention Module , 2018, ECCV.

[16]  Cynthia LeRouge,et al.  Developing multimodal intelligent affective interfaces for tele-home health care , 2003, Int. J. Hum. Comput. Stud..

[17]  Julian Togelius,et al.  Experience-Driven Procedural Content Generation , 2011, IEEE Transactions on Affective Computing.

[18]  Frédéric Jurie,et al.  Temporal multimodal fusion for video emotion classification in the wild , 2017, ICMI.

[19]  Abhishek Das,et al.  Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[20]  Yong Jae Lee,et al.  Track and Transfer: Watching Videos to Simulate Strong Human Supervision for Weakly-Supervised Object Detection , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Àgata Lapedriza,et al.  Emotion Recognition in Context , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Takeo Kanade,et al.  The Extended Cohn-Kanade Dataset (CK+): A complete dataset for action unit and emotion-specified expression , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[23]  Maja Pantic,et al.  AFEW-VA database for valence and arousal estimation in-the-wild , 2017, Image Vis. Comput..

[24]  Ming-Hsuan Yang,et al.  Weakly Supervised Coupled Networks for Visual Sentiment Analysis , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[25]  Mohammad H. Mahoor,et al.  AffectNet: A Database for Facial Expression, Valence, and Arousal Computing in the Wild , 2017, IEEE Transactions on Affective Computing.

[26]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[27]  Jesse Hoey,et al.  EmotiW 2016: video and group-level emotion recognition challenges , 2016, ICMI.

[28]  Seungryong Kim,et al.  Unified Confidence Estimation Networks for Robust Stereo Matching , 2019, IEEE Transactions on Image Processing.

[29]  Jiebo Luo,et al.  Visual Sentiment Analysis by Attending on Local Image Regions , 2017, AAAI.

[30]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[31]  M. Bar,et al.  The role of the parahippocampal cortex in cognition , 2013, Trends in Cognitive Sciences.

[32]  Xiangyang Xue,et al.  Predicting Emotions in User-Generated Videos , 2014, AAAI.

[33]  Ping Hu,et al.  Learning supervised scoring ensemble for emotion recognition in the wild , 2017, ICMI.

[34]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[35]  Arthur C. Graesser,et al.  Toward an Affect-Sensitive AutoTutor , 2007, IEEE Intelligent Systems.

[36]  P. Ekman,et al.  Facial action coding system: a technique for the measurement of facial movement , 1978 .

[37]  James Hays,et al.  COCO Attributes: Attributes for People, Animals, and Objects , 2016, ECCV.

[38]  Mohammad H. Mahoor,et al.  Facial Expression Recognition from World Wild Web , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[39]  Maja Pantic,et al.  Discriminative Shared Gaussian Processes for Multiview and View-Invariant Facial Expression Recognition , 2015, IEEE Transactions on Image Processing.

[40]  Bing Li,et al.  Context-aware affective images classification based on bilayer sparse representation , 2012, ACM Multimedia.

[41]  Victor O. K. Li,et al.  Video-based Emotion Recognition Using Deeply-Supervised Neural Networks , 2018, ICMI.

[42]  B. Mesquita,et al.  Context in Emotion Perception , 2011 .

[43]  Luc Van Gool,et al.  Recognizing emotions expressed by body pose: A biologically inspired neural model , 2008, Neural Networks.

[44]  Yoshua Bengio,et al.  Challenges in representation learning: A report on three machine learning contests , 2013, Neural Networks.

[45]  Enhua Wu,et al.  Squeeze-and-Excitation Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[46]  Andrea Kleinsmith,et al.  Recognizing Affective Dimensions from Body Posture , 2007, ACII.

[47]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Hatice Gunes,et al.  Continuous Prediction of Spontaneous Affect from Multiple Cues and Modalities in Valence-Arousal Space , 2011, IEEE Transactions on Affective Computing.

[49]  Tamás D. Gedeon,et al.  Collecting Large, Richly Annotated Facial-Expression Databases from Movies , 2012, IEEE MultiMedia.

[50]  Bolei Zhou,et al.  Learning Deep Features for Discriminative Localization , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[51]  Seungryong Kim,et al.  Spatiotemporal Attention Based Deep Neural Networks for Emotion Recognition , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[52]  Lorenzo Torresani,et al.  Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[53]  Anthony Steed,et al.  Automatic Recognition of Non-Acted Affective Postures , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).