Context Based Emotion Recognition Using EMOTIC Dataset

In our everyday lives and social interactions we often try to perceive the emotional states of people. There has been a lot of research in providing machines with a similar capacity of recognizing emotions. From a computer vision perspective, most of the previous efforts have been focusing in analyzing the facial expressions and, in some cases, also the body pose. Some of these methods work remarkably well in specific settings. However, their performance is limited in natural, unconstrained environments. Psychological studies show that the scene context, in addition to facial expression and body pose, provides important information to our perception of people's emotions. However, the processing of the context for automatic emotion recognition has not been explored in depth, partly due to the lack of proper data. In this paper we present EMOTIC, a dataset of images of people in a diverse set of natural situations, annotated with their apparent emotion. The EMOTIC dataset combines two different types of emotion representation: (1) a set of 26 discrete categories, and (2) the continuous dimensions Valence, Arousal, and Dominance. We also present a detailed statistical and algorithmic analysis of the dataset along with annotators’ agreement analysis. Using the EMOTIC dataset we train different CNN models for emotion recognition, combining the information of the bounding box containing the person with the contextual information extracted from the scene. Our results show how scene context provides important information to automatically recognize emotional states and motivate further research in this direction.

[1]  A. Mehrabian Framework for a comprehensive description and measurement of emotional states. , 1995, Genetic, social, and general psychology monographs.

[2]  Tamás D. Gedeon,et al.  Collecting Large, Richly Annotated Facial-Expression Databases from Movies , 2012, IEEE MultiMedia.

[3]  P. Ekman,et al.  Constants across cultures in the face and emotion. , 1971, Journal of personality and social psychology.

[4]  M. Althaus,et al.  The Empathy and Systemizing Quotient: The Psychometric Properties of the Dutch Version and a Review of the Cross-Cultural Stability , 2015, Journal of autism and developmental disorders.

[5]  Sergio Escalera,et al.  ChaLearn Looking at People: Events and Resources , 2017, ArXiv.

[6]  Bolei Zhou,et al.  Places: A 10 Million Image Database for Scene Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Roland Göcke,et al.  Finding Happiest Moments in a Social Context , 2012, ACCV.

[8]  Yong Tao,et al.  Compound facial expressions of emotion , 2014, Proceedings of the National Academy of Sciences.

[9]  Masahide Kaneko,et al.  Facial-component-based bag of words and PHOG descriptor for facial expression recognition , 2009, 2009 IEEE International Conference on Systems, Man and Cybernetics.

[10]  Andrea Kleinsmith,et al.  Recognizing Affective Dimensions from Body Posture , 2007, ACII.

[11]  Rongrong Ji,et al.  Large-scale visual sentiment ontology and detectors using adjective noun pairs , 2013, ACM Multimedia.

[12]  Wenxuan Mou,et al.  Group-level arousal and valence recognition in static images: Face, body and context , 2015, 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[13]  Rosalind W. Picard Affective Computing , 1997 .

[14]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[15]  B. Mesquita,et al.  Context in Emotion Perception , 2011 .

[16]  Luc Van Gool,et al.  Recognizing emotions expressed by body pose: A biologically inspired neural model , 2008, Neural Networks.

[17]  B. Mesquita,et al.  Placing the face in context: cultural differences in the perception of facial emotion. , 2008, Journal of personality and social psychology.

[18]  Àgata Lapedriza,et al.  Emotion Recognition in Context , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[20]  Bolei Zhou,et al.  Places: An Image Database for Deep Scene Understanding , 2016, ArXiv.

[21]  Hatice Gunes,et al.  Continuous Prediction of Spontaneous Affect from Multiple Cues and Modalities in Valence-Arousal Space , 2011, IEEE Transactions on Affective Computing.

[22]  P. Ekman,et al.  Facial action coding system: a technique for the measurement of facial movement , 1978 .

[23]  A. Hughes Oxford English Dictionary. , 2008, Isis; an international review devoted to the history of science and its cultural influences.

[24]  James Hays,et al.  COCO Attributes: Attributes for People, Animals, and Objects , 2016, ECCV.

[25]  Aleix M. Martínez,et al.  EmotioNet: An Accurate, Real-Time Algorithm for the Automatic Annotation of a Million Facial Expressions in the Wild , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Jesse Hoey,et al.  EmotiW 2016: video and group-level emotion recognition challenges , 2016, ICMI.

[27]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[28]  Rich Caruana,et al.  A Dozen Tricks with Multitask Learning , 1996, Neural Networks: Tricks of the Trade.

[29]  P. Ekman,et al.  Measuring facial movement , 1976 .

[30]  Mohammad Soleymani,et al.  Analysis of EEG Signals and Facial Expressions for Continuous Emotion Detection , 2016, IEEE Transactions on Affective Computing.

[31]  Maja Pantic,et al.  Expert system for automatic analysis of facial expressions , 2000, Image Vis. Comput..

[32]  Jessica L. Tracy,et al.  Development of a Facs-verified Set of Basic and Self-conscious Emotion Expressions Extant Facs-verified Sets , 2009 .

[33]  Anthony Steed,et al.  Automatic Recognition of Non-Acted Affective Postures , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[34]  L. F. Barrett How Emotions Are Made: The Secret Life of the Brain , 2017 .

[35]  Bolei Zhou,et al.  Semantic Understanding of Scenes Through the ADE20K Dataset , 2016, International Journal of Computer Vision.

[36]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[37]  Tao Chen,et al.  DeepSentiBank: Visual Sentiment Concept Classification with Deep Convolutional Neural Networks , 2014, ArXiv.

[38]  Lars Petersson,et al.  DecomposeMe: Simplifying ConvNets for End-to-End Learning , 2016, ArXiv.

[39]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[40]  B. de Gelder,et al.  Rapid influence of emotional scenes on encoding of facial expressions: an ERP study. , 2008, Social cognitive and affective neuroscience.