Multimodal sentiment analysis based on multi-head attention mechanism

Multimodal sentiment analysis is still a promising area of research, which has many issues needed to be addressed. Among them, extracting reasonable unimodal features and designing a robust multimodal sentiment analysis model is the most basic problem. This paper presents some novel ways of extracting sentiment features from visual, audio and text, furthermore use these features to verify the multimodal sentiment analysis model based on multi-head attention mechanism. The proposed model is evaluated on Multimodal Opinion Utterances Dataset (MOUD) corpus and CMU Multi-modal Opinion-level Sentiment Intensity (CMU-MOSI) corpus for multimodal sentiment analysis. Experimental results prove the effectiveness of the proposed approach. The accuracy of the MOUD and MOSI datasets is 90.43% and 82.71%, respectively. Compared to the state-of-the-art models, the improvement of the performance are approximately 2 and 0.4 points.

[1]  Ruijun Liu,et al.  A Survey of Sentiment Analysis Based on Transfer Learning , 2019, IEEE Access.

[2]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[3]  Erik Cambria,et al.  Tensor Fusion Network for Multimodal Sentiment Analysis , 2017, EMNLP.

[4]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[5]  Jeremy Barnes,et al.  Assessing State-of-the-Art Sentiment Models on State-of-the-Art Sentiment Datasets , 2017, WASSA@EMNLP.

[6]  Angeliki Metallinou,et al.  Audio-Visual Emotion Recognition Using Gaussian Mixture Models for Face and Voice , 2008, 2008 Tenth IEEE International Symposium on Multimedia.

[7]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[8]  P. Ekman Universal facial expressions of emotion. , 1970 .

[9]  CambriaErik,et al.  A review of affective computing , 2017 .

[10]  Hua Xu,et al.  Audio Sentiment Analysis by Heterogeneous Signal Features Learned from Utterance-Based Parallel Neural Network , 2018, AffCon@AAAI.

[11]  Erik Cambria,et al.  Multimodal Sentiment Analysis using Hierarchical Fusion with Context Modeling , 2018, Knowl. Based Syst..

[12]  Louis-Philippe Morency,et al.  Multimodal Sentiment Intensity Analysis in Videos: Facial Gestures and Verbal Messages , 2016, IEEE Intelligent Systems.

[13]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[14]  Louis-Philippe Morency,et al.  Multimodal Machine Learning: A Survey and Taxonomy , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Sridha Sridharan,et al.  Two Stream LSTM: A Deep Fusion Framework for Human Action Recognition , 2017, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).

[16]  Erik Cambria,et al.  Deep Convolutional Neural Network Textual Features and Multiple Kernel Learning for Utterance-level Multimodal Sentiment Analysis , 2015, EMNLP.

[17]  Bo Pang,et al.  Seeing Stars: Exploiting Class Relationships for Sentiment Categorization with Respect to Rating Scales , 2005, ACL.

[18]  Verónica Pérez-Rosas,et al.  Utterance-Level Multimodal Sentiment Analysis , 2013, ACL.

[19]  Erik Cambria,et al.  Multi-attention Recurrent Network for Human Communication Comprehension , 2018, AAAI.

[20]  Yuanliu Liu,et al.  Video-based emotion recognition using CNN-RNN and C3D hybrid networks , 2016, ICMI.

[21]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[22]  Erik Cambria,et al.  Context-Dependent Sentiment Analysis in User-Generated Videos , 2017, ACL.

[23]  Ziqian Luo,et al.  Sentiment Analysis using Deep Robust Complementary Fusion of Multi-Features and Multi-Modalities , 2019, ArXiv.

[24]  Trevor Darrell,et al.  Long-term recurrent convolutional networks for visual recognition and description , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Pushpak Bhattacharyya,et al.  Contextual Inter-modal Attention for Multi-modal Sentiment Analysis , 2018, EMNLP.

[26]  Claire Cardie,et al.  39. Opinion mining and sentiment analysis , 2014 .

[27]  Sen Wang,et al.  Multimodal sentiment analysis with word-level fusion and reinforcement learning , 2017, ICMI.

[28]  Erik Cambria,et al.  A review of affective computing: From unimodal analysis to multimodal fusion , 2017, Inf. Fusion.