Various syncretic co‐attention network for multimodal sentiment analysis

The multimedia contents shared on social network reveal public sentimental attitudes toward specific events. Therefore, it is necessary to conduct sentiment analysis automatically on abundant multimedia data posted by the public for real‐world applications. However, approaches to single‐modal sentiment analysis neglect the internal connections between textual and visual contents, and current multimodal methods fail to exploit the multilevel semantic relations of heterogeneous features. In this article, the various syncretic co‐attention network is proposed to excavate the intricate multilevel corresponding relations between multimodal data, and combine the unique information of each modality for integrated complementary sentiment classification. Specifically, a multilevel co‐attention module is constructed to explore localized correspondences between each image region and each text word, and holistic correspondences between global visual information and context‐based textual semantics. Then, all the single‐modal features can be fused from different levels, respectively. Except for fused multimodal features, our proposed VSCN also considers unique information of each modality simultaneously and integrates them into an end‐to‐end framework for sentiment analysis. The superior results of experiments on three constructed real‐world datasets and a benchmark dataset of Visual Sentiment Ontology (VSO) prove the effectiveness of our proposed VSCN. Especially qualitative analyses are given for deep explaining of our method.

[1]  Erik Cambria,et al.  Convolutional MKL Based Multimodal Emotion Recognition and Sentiment Analysis , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[2]  Roberto Henriques,et al.  Social Market: Stock Market and Twitter Correlation , 2017, KES-IDT.

[3]  Ajmal S. Mian,et al.  Learning a deeply supervised multi-modal RGB-D embedding for semantic scene and object category recognition , 2017, Robotics Auton. Syst..

[4]  Amit K. Roy-Chowdhury,et al.  Webly Supervised Joint Embedding for Cross-Modal Image-Text Retrieval , 2018, ACM Multimedia.

[5]  Maofu Liu,et al.  An image-text consistency driven multimodal sentiment analysis approach for social media , 2019, Inf. Process. Manag..

[6]  Sanjay Singh,et al.  Image sentiment analysis using deep convolutional neural networks with domain specific fine tuning , 2015, 2015 International Conference on Information Processing (ICIP).

[7]  Anastasios Tefas,et al.  Visual Question Answering using Explicit Visual Attention , 2018, 2018 IEEE International Symposium on Circuits and Systems (ISCAS).

[8]  Xin Yu,et al.  Compass: Spatio Temporal Sentiment Analysis of US Election What Twitter Says! , 2017, KDD.

[9]  Changsheng Xu,et al.  Multi-modal Multi-view Topic-opinion Mining for Social Event Analysis , 2016, ACM Multimedia.

[10]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[11]  Jonathon S. Hare,et al.  Analyzing and predicting sentiment of images on the social web , 2010, ACM Multimedia.

[12]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Jiebo Luo,et al.  Robust Visual-Textual Sentiment Analysis: When Attention meets Tree-structured Recursive Neural Networks , 2016, ACM Multimedia.

[14]  Yu Zhou,et al.  Natural Language Processing and Chinese Computing , 2017, Lecture Notes in Computer Science.

[15]  Eric Gilbert,et al.  VADER: A Parsimonious Rule-Based Model for Sentiment Analysis of Social Media Text , 2014, ICWSM.

[16]  Guiguang Ding,et al.  Cross-Modal Image-Text Retrieval with Semantic Consistency , 2019, ACM Multimedia.

[17]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[18]  Claire Cardie,et al.  Opinion Mining with Deep Recurrent Neural Networks , 2014, EMNLP.

[19]  Tiejun Zhao,et al.  Deep Attention Neural Tensor Network for Visual Question Answering , 2018, ECCV.

[20]  Tsuhan Chen,et al.  Where do emotions come from? Predicting the Emotion Stimuli Map , 2016, 2016 IEEE International Conference on Image Processing (ICIP).

[21]  Liqiang Nie,et al.  Neural Multimodal Cooperative Learning Toward Micro-Video Understanding , 2020, IEEE Transactions on Image Processing.

[22]  Gang Wang,et al.  Large-Margin Multi-Modal Deep Learning for RGB-D Object Recognition , 2015, IEEE Transactions on Multimedia.

[23]  Ke Lu,et al.  RGB-D object recognition with multimodal deep convolutional neural networks , 2017, 2017 IEEE International Conference on Multimedia and Expo (ICME).

[24]  Calkin Suero Montero,et al.  Using Machine Learning for Sentiment and Social Influence Analysis in Text , 2018, ICITS.

[25]  Rongrong Ji,et al.  Large-scale visual sentiment ontology and detectors using adjective noun pairs , 2013, ACM Multimedia.

[26]  Qingzhong Liu,et al.  Can twitter posts predict stock behavior?: A study of stock market with twitter social emotion , 2016, 2016 IEEE International Conference on Cloud Computing and Big Data Analysis (ICCCBDA).

[27]  Guoyong Cai,et al.  Convolutional Neural Networks for Multimedia Sentiment Analysis , 2015, NLPCC.

[28]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[29]  Soon Ae Chun,et al.  Twitter sentiment classification for measuring public health concerns , 2015, Social Network Analysis and Mining.

[30]  Hongfei Lin,et al.  Visual and Textual Sentiment Analysis of a Microblog Using Deep Convolutional Neural Networks , 2016, Algorithms.

[31]  Mirna Adriani,et al.  Buzzer Detection and Sentiment Analysis for Predicting Presidential Election Results in a Twitter Nation , 2015, 2015 IEEE International Conference on Data Mining Workshop (ICDMW).

[32]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[33]  Chandan Dasgupta,et al.  Using Twitter data to predict the performance of Bollywood movies , 2015, Ind. Manag. Data Syst..

[34]  Michael S. Bernstein,et al.  Empath: Understanding Topic Signals in Large-Scale Text , 2016, CHI.

[35]  Jiebo Luo,et al.  Cross-modality Consistent Regression for Joint Visual-Textual Sentiment Analysis of Social Multimedia , 2016, WSDM.

[36]  Xi Chen,et al.  Stacked Cross Attention for Image-Text Matching , 2018, ECCV.

[37]  Ming-Hsuan Yang,et al.  Weakly Supervised Coupled Networks for Visual Sentiment Analysis , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[38]  Takayuki Okatani,et al.  Improved Fusion of Visual and Language Representations by Dense Symmetric Co-attention for Visual Question Answering , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[39]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[40]  Huchuan Lu,et al.  Deep Cross-Modal Projection Learning for Image-Text Matching , 2018, ECCV.

[41]  Sajjad Haider,et al.  Impact analysis of adverbs for sentiment classification on Twitter product reviews , 2018, Concurr. Comput. Pract. Exp..

[42]  Mohan S. Kankanhalli,et al.  Emotional Attention: A Study of Image Sentiment and Visual Attention , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[43]  Asim Karim,et al.  Bias-aware lexicon-based sentiment analysis , 2015, SAC.

[44]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Wenji Mao,et al.  A residual merged neutral network for multimodal sentiment analysis , 2017, 2017 IEEE 2nd International Conference on Big Data Analysis (ICBDA)(.

[46]  Jiebo Luo,et al.  Robust Image Sentiment Analysis Using Progressively Trained and Domain Transferred Deep Networks , 2015, AAAI.

[47]  Feiran Huang,et al.  Visual-textual sentiment classification with bi-directional multi-level attention networks , 2019, Knowl. Based Syst..

[48]  Qiang Chen,et al.  Network In Network , 2013, ICLR.

[49]  Philip S. Yu,et al.  A holistic lexicon-based approach to opinion mining , 2008, WSDM '08.

[50]  Ming Zhou,et al.  Adaptive Recursive Neural Network for Target-dependent Twitter Sentiment Classification , 2014, ACL.

[51]  Rongrong Ji,et al.  A cross-media public sentiment analysis system for microblog , 2014, Multimedia Systems.

[52]  Wenmin Wang,et al.  Learning multi-view embedding in joint space for bidirectional image-text retrieval , 2017, 2017 IEEE Visual Communications and Image Processing (VCIP).