Social Image Sentiment Analysis by Exploiting Multimodal Content and Heterogeneous Relations

In the circumstance of social big data, sentiment analysis is attracting increasing attention for its capacity in understanding individuals’ attitudes and feelings. Traditional sentiment analysis methods focus on single modality and become ineffective as enormous data are emerging on the social websites with multiple manifestations. In this article, multimodal learning approaches are proposed to capture the relations between image and text, which only stay at the region level and ignore the fact that the channels are also closely correlated with the semantic information. In addition, social images in the social platforms are closely connected by various types of relations, which are also conducice to sentiment classification but neglected by most existing works. In this article, we propose an attention-based heterogeneous relational model to improve the multimodal sentiment analysis performance by incorporating rich social information. Specifically, we propose a progressive dual attention module to capture the correlations between image and text, and then learn the joint image-text representation from the perspective of content information. A channel attention schema is proposed here to highlight semantically rich image channels and a region attention schema is further designed to highlight the emotional regions based on the attended channels. After that, we construct a heterogeneous relation network and extend graph convolutional network to aggregate the content information from social contexts as complements to learn high-quality representations of social images. Our proposal is thoroughly evaluated on two benchmark datasets, and experimental results demonstrate the superiority of the proposed model.

[1]  Chen Xi,et al.  Multimodal sentiment analysis based on multi-head attention mechanism , 2020, ICMLSC.

[2]  Jiebo Luo,et al.  Robust Image Sentiment Analysis Using Progressively Trained and Domain Transferred Deep Networks , 2015, AAAI.

[3]  Erik Cambria,et al.  Multimodal Sentiment Analysis using Hierarchical Fusion with Context Modeling , 2018, Knowl. Based Syst..

[4]  Jun Zhao,et al.  IntentGC: A Scalable Graph Convolution Framework Fusing Heterogeneous Information for Recommendation , 2019, KDD.

[5]  Wilson Vicente Ruggiero,et al.  A Knowledge-Based Recommendation System That Includes Sentiment Analysis and Deep Learning , 2019, IEEE Transactions on Industrial Informatics.

[6]  Yue Gao,et al.  Multimodal hypergraph learning for microblog sentiment prediction , 2015, 2015 IEEE International Conference on Multimedia and Expo (ICME).

[7]  Feiran Huang,et al.  Image-text sentiment analysis via deep multimodal attentive fusion , 2019, Knowl. Based Syst..

[8]  Yale Song,et al.  Polysemous Visual-Semantic Embedding for Cross-Modal Retrieval , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Lidong Bing,et al.  Recurrent Attention Network on Memory for Aspect Sentiment Analysis , 2017, EMNLP.

[10]  Andrew Zisserman,et al.  Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.

[11]  Erik Cambria,et al.  Fusing audio, visual and textual clues for sentiment analysis from multimodal content , 2016, Neurocomputing.

[12]  Heng Tao Shen,et al.  From Pixels to Objects: Cubic Visual Attention for Visual Question Answering , 2018, IJCAI.

[13]  Jiebo Luo,et al.  Cross-modality Consistent Regression for Joint Visual-Textual Sentiment Analysis of Social Multimedia , 2016, WSDM.

[14]  Rongrong Ji,et al.  Large-scale visual sentiment ontology and detectors using adjective noun pairs , 2013, ACM Multimedia.

[15]  Yongfeng Huang,et al.  A multi-granularity fuzzy computing model for sentiment classification of Chinese reviews , 2016, J. Intell. Fuzzy Syst..

[16]  Erik Cambria,et al.  Deep Convolutional Neural Network Textual Features and Multiple Kernel Learning for Utterance-level Multimodal Sentiment Analysis , 2015, EMNLP.

[17]  Rada Mihalcea,et al.  Towards multimodal sentiment analysis: harvesting opinions from the web , 2011, ICMI '11.

[18]  Rongrong Ji,et al.  A cross-media public sentiment analysis system for microblog , 2014, Multimedia Systems.

[19]  Wolfram Burgard,et al.  Deep Multispectral Semantic Scene Understanding of Forested Environments Using Multimodal Fusion , 2016, ISER.

[20]  Rongrong Ji,et al.  Microblog Sentiment Analysis Based on Cross-media Bag-of-words Model , 2014, ICIMCS '14.

[21]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[22]  Feiran Huang,et al.  Sentiment analysis of social images via hierarchical deep fusion of content and links , 2019, Appl. Soft Comput..

[23]  Christian Wolf,et al.  ModDrop: Adaptive Multi-Modal Gesture Recognition , 2014, IEEE Trans. Pattern Anal. Mach. Intell..

[24]  Pushpak Bhattacharyya,et al.  Medical Sentiment Analysis using Social Media: Towards building a Patient Assisted System , 2018, LREC.

[25]  Erik Cambria,et al.  Multi-level Multiple Attentions for Contextual Multimodal Sentiment Analysis , 2017, 2017 IEEE International Conference on Data Mining (ICDM).

[26]  Shanshan Li,et al.  Deep Collective Classification in Heterogeneous Information Networks , 2018, WWW.

[27]  Chong-Wah Ngo,et al.  Deep Multimodal Learning for Affective Analysis and Retrieval , 2015, IEEE Transactions on Multimedia.

[28]  Erik Cambria,et al.  Tensor Fusion Network for Multimodal Sentiment Analysis , 2017, EMNLP.

[29]  Ling Shao,et al.  Deep Dynamic Neural Networks for Multimodal Gesture Segmentation and Recognition , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Jufeng Yang,et al.  Joint Image Emotion Classification and Distribution Learning via Deep Convolutional Neural Network , 2017, IJCAI.