Context-Dependent Sentiment Analysis in User-Generated Videos

Multimodal sentiment analysis is a developing area of research, which involves the identification of sentiments in videos. Current research considers utterances as independent entities, i.e., ignores the interdependencies and relations among the utterances of a video. In this paper, we propose a LSTM-based model that enables utterances to capture contextual information from their surroundings in the same video, thus aiding the classification process. Our method shows 5-10% performance improvement over the state of the art and high robustness to generalizability.

[1]  Carlos Busso,et al.  IEMOCAP: interactive emotional dyadic motion capture database , 2008, Lang. Resour. Evaluation.

[2]  Erik Cambria,et al.  A graph-based approach to commonsense concept extraction and semantic similarity detection , 2013, WWW.

[3]  Dipankar Das,et al.  A Practical Guide to Sentiment Analysis , 2017 .

[4]  Björn W. Schuller,et al.  SenticNet 4: A Semantic Resource for Sentiment Analysis Based on Conceptual Primitives , 2016, COLING.

[5]  Erik Cambria,et al.  Benchmarking Multimodal Sentiment Analysis , 2017, CICLing.

[6]  Verónica Pérez-Rosas,et al.  Utterance-Level Multimodal Sentiment Analysis , 2013, ACL.

[7]  Erik Cambria,et al.  Label Embedding for Zero-shot Fine-grained Named Entity Typing , 2016, COLING.

[8]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[9]  Ming Yang,et al.  3D Convolutional Neural Networks for Human Action Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Loïc Kessous,et al.  Multimodal emotion recognition in speech-based interaction using facial expression, body gesture and acoustic analysis , 2010, Journal on Multimodal User Interfaces.

[11]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[12]  Davide Anguita,et al.  Statistical Learning Theory and ELM for Big Social Data Analysis , 2016, IEEE Computational Intelligence Magazine.

[13]  D. Olson From Utterance to Text: The Bias of Language in Speech and Writing , 1977 .

[14]  Léon J. M. Rothkrantz,et al.  Semantic Audiovisual Data Fusion for Automatic Emotion Recognition , 2015 .

[15]  Yee Whye Teh,et al.  Rate-coded Restricted Boltzmann Machines for Face Recognition , 2000, NIPS.

[16]  Tao Mei,et al.  Automatic Video Genre Categorization using Hierarchical SVM , 2006, 2006 International Conference on Image Processing.

[17]  F. Gers,et al.  Long short-term memory in recurrent neural networks , 2001 .

[18]  Erik Cambria,et al.  Convolutional MKL Based Multimodal Emotion Recognition and Sentiment Analysis , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[19]  Erik Cambria,et al.  Deep Convolutional Neural Network Textual Features and Multiple Kernel Learning for Utterance-level Multimodal Sentiment Analysis , 2015, EMNLP.

[20]  Chung-Hsien Wu,et al.  Emotion Recognition of Affective Speech Based on Multiple Classifiers Using Acoustic-Prosodic Information and Semantic Labels , 2015, IEEE Transactions on Affective Computing.

[21]  Erik Cambria,et al.  Ensemble application of convolutional neural networks and multiple kernel learning for multimodal sentiment analysis , 2017, Neurocomputing.

[22]  Erik Cambria,et al.  A review of affective computing: From unimodal analysis to multimodal fusion , 2017, Inf. Fusion.

[23]  Erik Cambria,et al.  Sentic LDA: Improving on LDA with semantic similarity for aspect-based sentiment analysis , 2016, 2016 International Joint Conference on Neural Networks (IJCNN).

[24]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[25]  Fei-Fei Li,et al.  Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Erik Cambria,et al.  AFFECTIVE COMPUTI G AND SENTIMENT ANALYSIS Deep Learning-Based Document Modeling for Personality Detection from Text , 2017 .

[27]  Tsutomu Miyasato,et al.  Multimodal human emotion/expression recognition , 1998, Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition.

[28]  Erik Cambria,et al.  Aspect extraction for opinion mining with a deep convolutional neural network , 2016, Knowl. Based Syst..

[29]  Erik Cambria,et al.  Jumping NLP Curves: A Review of Natural Language Processing Research [Review Article] , 2014, IEEE Computational Intelligence Magazine.

[30]  Erik Cambria,et al.  A Deeper Look into Sarcastic Tweets Using Deep Convolutional Neural Networks , 2016, COLING.

[31]  P. Ekman Universal facial expressions of emotion. , 1970 .

[32]  Rohit Kumar,et al.  Ensemble of SVM trees for multimodal emotion recognition , 2012, Proceedings of The 2012 Asia Pacific Signal and Information Processing Association Annual Summit and Conference.

[33]  Angeliki Metallinou,et al.  Audio-Visual Emotion Recognition Using Gaussian Mixture Models for Face and Voice , 2008, 2008 Tenth IEEE International Symposium on Multimedia.

[34]  Björn W. Schuller,et al.  Recognizing Affect from Linguistic Information in 3D Continuous Space , 2011, IEEE Transactions on Affective Computing.

[35]  L. de Silva,et al.  Facial emotion recognition using multi-modal information , 1997, Proceedings of ICICS, 1997 International Conference on Information, Communications and Signal Processing. Theme: Trends in Information Systems Engineering and Wireless Multimedia Communications (Cat..

[36]  Björn W. Schuller,et al.  Abandoning emotion classes - towards continuous emotion recognition with modelling of long-range dependencies , 2008, INTERSPEECH.

[37]  Björn Schuller,et al.  Opensmile: the munich versatile and fast open-source audio feature extractor , 2010, ACM Multimedia.

[38]  Björn W. Schuller,et al.  YouTube Movie Reviews: Sentiment Analysis in an Audio-Visual Context , 2013, IEEE Intelligent Systems.

[39]  Wei Shi,et al.  Attention-Based Bidirectional Long Short-Term Memory Networks for Relation Classification , 2016, ACL.

[40]  Björn W. Schuller,et al.  On-line emotion recognition in a 3-D activation-valence-time continuum using acoustic and linguistic cues , 2009, Journal on Multimodal User Interfaces.

[41]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[42]  Christopher Potts,et al.  Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[43]  Louis-Philippe Morency,et al.  Multimodal Sentiment Intensity Analysis in Videos: Facial Gestures and Verbal Messages , 2016, IEEE Intelligent Systems.