Multimodal Sentiment Detection Based on Multi-channel Graph Neural Networks

With the popularity of smartphones, we have witnessed the rapid proliferation of multimodal posts on various social media platforms. We observe that the multimodal sentiment expression has specific global characteristics, such as the interdependencies of objects or scenes within the image. However, most previous studies only considered the representation of a single image-text post and failed to capture the global co-occurrence characteristics of the dataset. In this paper, we propose Multi-channel Graph Neural Networks with Sentiment-awareness (MGNNS) for imagetext sentiment detection. Specifically, we first encode different modalities to capture hidden representations. Then, we introduce multichannel graph neural networks to learn multimodal representations based on the global characteristics of the dataset. Finally, we implement multimodal in-depth fusion with the multi-head attention mechanism to predict the sentiment of image-text pairs. Extensive experiments conducted on three publicly available datasets demonstrate the effectiveness of our approach for multimodal sentiment detection.

[1]  Jiebo Luo,et al.  Cross-modality Consistent Regression for Joint Visual-Textual Sentiment Analysis of Social Multimedia , 2016, WSDM.

[2]  Houfeng Wang,et al.  Text Level Graph Neural Network for Text Classification , 2019, EMNLP.

[3]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[4]  Zilong Wang,et al.  TransModality: An End2End Fusion Method with Transformer for Multimodal Sentiment Analysis , 2020, WWW.

[5]  Robert E. Mercer,et al.  Multi-Channel Convolutional Neural Network for Twitter Emotion and Sentiment Recognition , 2019, NAACL.

[6]  Hui Wang,et al.  Iterative Context-Aware Graph Inference for Visual Dialog , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Anjith George,et al.  Learning One Class Representations for Face Presentation Attack Detection Using Multi-Channel Convolutional Neural Networks , 2020, IEEE Transactions on Information Forensics and Security.

[8]  Weitong Chen,et al.  A survey of sentiment analysis in social media , 2018, Knowledge and Information Systems.

[9]  Christopher D. Manning,et al.  GQA: A New Dataset for Real-World Visual Reasoning and Compositional Question Answering , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Jun Zhao,et al.  Recurrent Convolutional Neural Networks for Text Classification , 2015, AAAI.

[11]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[12]  Xiu-Shen Wei,et al.  Multi-Label Image Recognition With Graph Convolutional Networks , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Olegs Nikisins,et al.  Biometric Face Presentation Attack Detection With Multi-Channel Convolutional Neural Network , 2019, IEEE Transactions on Information Forensics and Security.

[14]  Ramandeep Kaur,et al.  Multimodal Sentiment Analysis: A Survey and Comparison , 2019, Int. J. Serv. Sci. Manag. Eng. Technol..

[15]  Jiasen Lu,et al.  Hierarchical Question-Image Co-Attention for Visual Question Answering , 2016, NIPS.

[16]  Shuai Wang,et al.  Deep learning for sentiment analysis: A survey , 2018, WIREs Data Mining Knowl. Discov..

[17]  Daling Wang,et al.  Image-Text Multimodal Emotion Classification via Multi-View Attentional Network , 2020, IEEE Transactions on Multimedia.

[18]  Wenji Mao,et al.  MultiSentiNet: A Deep Semantic Network for Multimodal Sentiment Analysis , 2017, CIKM.

[19]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[20]  Verónica Pérez-Rosas,et al.  Utterance-Level Multimodal Sentiment Analysis , 2013, ACL.

[21]  Wenji Mao,et al.  A Co-Memory Network for Multimodal Sentiment Analysis , 2018, SIGIR.

[22]  Mahmoud Khademi,et al.  Multimodal Neural Graph Memory Networks for Visual Question Answering , 2020, ACL.

[23]  Ji Wu,et al.  Tensor Graph Convolutional Networks for Text Classification , 2020, AAAI.

[24]  Samuel S. Schoenholz,et al.  Neural Message Passing for Quantum Chemistry , 2017, ICML.

[25]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Ali Farhadi,et al.  YOLOv3: An Incremental Improvement , 2018, ArXiv.

[27]  Jun Hu,et al.  Fake News Detection via Knowledge-driven Multimodal Graph Convolutional Networks , 2020, ICMR.

[28]  Soujanya Poria,et al.  MISA: Modality-Invariant and -Specific Representations for Multimodal Sentiment Analysis , 2020, ACM Multimedia.

[29]  Yuan Luo,et al.  Graph Convolutional Networks for Text Classification , 2018, AAAI.

[30]  Zengchang Qin,et al.  KBGN: Knowledge-Bridge Graph Network for Adaptive Vision-Text Reasoning in Visual Dialogue , 2020, ACM Multimedia.

[31]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[32]  Shoushan Li,et al.  Multi-Modal Sentiment Classification With Independent and Interactive Knowledge via Semi-Supervised Learning , 2020, IEEE Access.

[33]  Nan Xu,et al.  Analyzing multimodal public sentiment based on hierarchical semantic attentional network , 2017, 2017 IEEE International Conference on Intelligence and Security Informatics (ISI).

[34]  Wei Shi,et al.  Attention-Based Bidirectional Long Short-Term Memory Networks for Relation Classification , 2016, ACL.

[35]  Abdulmotaleb El-Saddik,et al.  Sentiment Analysis on Multi-View Social Data , 2016, MMM.

[36]  Mohammad Soleymani,et al.  A survey of multimodal sentiment analysis , 2017, Image Vis. Comput..

[37]  Bolei Zhou,et al.  Places: A 10 Million Image Database for Scene Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.