论文信息 - Context-Dependent Sentiment Analysis in User-Generated Videos

Context-Dependent Sentiment Analysis in User-Generated Videos

Multimodal sentiment analysis is a developing area of research, which involves the identification of sentiments in videos. Current research considers utterances as independent entities, i.e., ignores the interdependencies and relations among the utterances of a video. In this paper, we propose a LSTM-based model that enables utterances to capture contextual information from their surroundings in the same video, thus aiding the classification process. Our method shows 5-10% performance improvement over the state of the art and high robustness to generalizability.

[1] Carlos Busso,et al. IEMOCAP: interactive emotional dyadic motion capture database , 2008, Lang. Resour. Evaluation.

[2] Erik Cambria,et al. A graph-based approach to commonsense concept extraction and semantic similarity detection , 2013, WWW.

[3] Dipankar Das,et al. A Practical Guide to Sentiment Analysis , 2017 .

[4] Björn W. Schuller,et al. SenticNet 4: A Semantic Resource for Sentiment Analysis Based on Conceptual Primitives , 2016, COLING.

[5] Erik Cambria,et al. Benchmarking Multimodal Sentiment Analysis , 2017, CICLing.

[6] Verónica Pérez-Rosas,et al. Utterance-Level Multimodal Sentiment Analysis , 2013, ACL.

[7] Erik Cambria,et al. Label Embedding for Zero-shot Fine-grained Named Entity Typing , 2016, COLING.

[8] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[9] Ming Yang,et al. 3D Convolutional Neural Networks for Human Action Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10] Loïc Kessous,et al. Multimodal emotion recognition in speech-based interaction using facial expression, body gesture and acoustic analysis , 2010, Journal on Multimodal User Interfaces.

[11] Jeffrey Dean,et al. Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[12] Davide Anguita,et al. Statistical Learning Theory and ELM for Big Social Data Analysis , 2016, IEEE Computational Intelligence Magazine.

[13] D. Olson. From Utterance to Text: The Bias of Language in Speech and Writing , 1977 .

[14] Léon J. M. Rothkrantz,et al. Semantic Audiovisual Data Fusion for Automatic Emotion Recognition , 2015 .

[15] Yee Whye Teh,et al. Rate-coded Restricted Boltzmann Machines for Face Recognition , 2000, NIPS.

[16] Tao Mei,et al. Automatic Video Genre Categorization using Hierarchical SVM , 2006, 2006 International Conference on Image Processing.

[17] F. Gers,et al. Long short-term memory in recurrent neural networks , 2001 .

[18] Erik Cambria,et al. Convolutional MKL Based Multimodal Emotion Recognition and Sentiment Analysis , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[19] Erik Cambria,et al. Deep Convolutional Neural Network Textual Features and Multiple Kernel Learning for Utterance-level Multimodal Sentiment Analysis , 2015, EMNLP.

[20] Chung-Hsien Wu,et al. Emotion Recognition of Affective Speech Based on Multiple Classifiers Using Acoustic-Prosodic Information and Semantic Labels , 2015, IEEE Transactions on Affective Computing.

[21] Erik Cambria,et al. Ensemble application of convolutional neural networks and multiple kernel learning for multimodal sentiment analysis , 2017, Neurocomputing.

[22] Erik Cambria,et al. A review of affective computing: From unimodal analysis to multimodal fusion , 2017, Inf. Fusion.

[23] Erik Cambria,et al. Sentic LDA: Improving on LDA with semantic similarity for aspect-based sentiment analysis , 2016, 2016 International Joint Conference on Neural Networks (IJCNN).

[24] Jason Weston,et al. Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[25] Fei-Fei Li,et al. Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[26] Erik Cambria,et al. AFFECTIVE COMPUTI G AND SENTIMENT ANALYSIS Deep Learning-Based Document Modeling for Personality Detection from Text , 2017 .

[27] Tsutomu Miyasato,et al. Multimodal human emotion/expression recognition , 1998, Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition.

[28] Erik Cambria,et al. Aspect extraction for opinion mining with a deep convolutional neural network , 2016, Knowl. Based Syst..

[29] Erik Cambria,et al. Jumping NLP Curves: A Review of Natural Language Processing Research [Review Article] , 2014, IEEE Computational Intelligence Magazine.

[30] Erik Cambria,et al. A Deeper Look into Sarcastic Tweets Using Deep Convolutional Neural Networks , 2016, COLING.

[31] P. Ekman. Universal facial expressions of emotion. , 1970 .

[32] Rohit Kumar,et al. Ensemble of SVM trees for multimodal emotion recognition , 2012, Proceedings of The 2012 Asia Pacific Signal and Information Processing Association Annual Summit and Conference.

[33] Angeliki Metallinou,et al. Audio-Visual Emotion Recognition Using Gaussian Mixture Models for Face and Voice , 2008, 2008 Tenth IEEE International Symposium on Multimedia.

[34] Björn W. Schuller,et al. Recognizing Affect from Linguistic Information in 3D Continuous Space , 2011, IEEE Transactions on Affective Computing.

[35] L. de Silva,et al. Facial emotion recognition using multi-modal information , 1997, Proceedings of ICICS, 1997 International Conference on Information, Communications and Signal Processing. Theme: Trends in Information Systems Engineering and Wireless Multimedia Communications (Cat..

[36] Björn W. Schuller,et al. Abandoning emotion classes - towards continuous emotion recognition with modelling of long-range dependencies , 2008, INTERSPEECH.

[37] Björn Schuller,et al. Opensmile: the munich versatile and fast open-source audio feature extractor , 2010, ACM Multimedia.

[38] Björn W. Schuller,et al. YouTube Movie Reviews: Sentiment Analysis in an Audio-Visual Context , 2013, IEEE Intelligent Systems.

[39] Wei Shi,et al. Attention-Based Bidirectional Long Short-Term Memory Networks for Relation Classification , 2016, ACL.

[40] Björn W. Schuller,et al. On-line emotion recognition in a 3-D activation-valence-time continuum using acoustic and linguistic cues , 2009, Journal on Multimodal User Interfaces.

[41] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[42] Christopher Potts,et al. Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[43] Louis-Philippe Morency,et al. Multimodal Sentiment Intensity Analysis in Videos: Facial Gestures and Verbal Messages , 2016, IEEE Intelligent Systems.