Visual-textual sentiment classification with bi-directional multi-level attention networks

Abstract Social network has become an inseparable part of our daily lives and thus the automatic sentiment analysis on social media content is of great significance to identify people’s viewpoints, attitudes, and emotions on the social websites. Most existing works have concentrated on the sentiment analysis of single modality such as image or text, which cannot handle the social media content with multiple modalities including both image and text. Although some works tried to conduct multi-modal sentiment analysis, the complicated correlations between the two modalities have not been fully explored. In this paper, we propose a novel Bi-Directional Multi-Level Attention (BDMLA) model to exploit the complementary and comprehensive information between the image modality and text modality for joint visual-textual sentiment classification. Specifically, to highlight the emotional regions and words in the image–text pair, visual attention network and semantic attention network are proposed respectively. The visual attention network makes region features of the image interact with multiple semantic levels of text (word, phrase, and sentence) to obtain the attended visual features. The semantic attention network makes semantic features of the text interact with multiple visual levels of image (global and local) to obtain the attended semantic features. Then, the attended visual and semantic features from the two attention networks are unified into a holistic framework to conduct visual-textual sentiment classification. Proof-of-concept experiments conducted on three real-world datasets verify the effectiveness of our model.

[1]  Erik Cambria,et al.  Sentic Medoids: Organizing Affective Common Sense Knowledge in a Multi-Dimensional Vector Space , 2011, ISNN.

[2]  Bing Li,et al.  Scaring or pleasing: exploit emotional impact of an image , 2012, ACM Multimedia.

[3]  Alessandro Moschitti,et al.  Twitter Sentiment Analysis with Deep Convolutional Neural Networks , 2015, SIGIR.

[4]  Mirna Adriani,et al.  Buzzer Detection and Sentiment Analysis for Predicting Presidential Election Results in a Twitter Nation , 2015, 2015 IEEE International Conference on Data Mining Workshop (ICDMW).

[5]  Patrick Paroubek,et al.  Twitter as a Corpus for Sentiment Analysis and Opinion Mining , 2010, LREC.

[6]  Hiroshi Kanayama,et al.  Fully Automatic Lexicon Expansion for Domain-oriented Sentiment Analysis , 2006, EMNLP.

[7]  Rongrong Ji,et al.  A cross-media public sentiment analysis system for microblog , 2014, Multimedia Systems.

[8]  Isabell M. Welpe,et al.  Predicting Elections with Twitter: What 140 Characters Reveal about Political Sentiment , 2010, ICWSM.

[9]  Mauro Dragoni,et al.  A Neural Word Embeddings Approach for Multi-Domain Sentiment Analysis , 2017, IEEE Transactions on Affective Computing.

[10]  Harith Alani,et al.  Contextual semantics for sentiment analysis of Twitter , 2016, Inf. Process. Manag..

[11]  Fei-Fei Li,et al.  Deep visual-semantic alignments for generating image descriptions , 2015, CVPR.

[12]  Justus J. Randolph Free-Marginal Multirater Kappa (multirater K[free]): An Alternative to Fleiss' Fixed-Marginal Multirater Kappa. , 2005 .

[13]  V. S. Subrahmanian,et al.  Using Twitter Sentiment to Forecast the 2013 Pakistani Election and the 2014 Indian Election , 2015, IEEE Intelligent Systems.

[14]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[15]  Maite Taboada,et al.  Lexicon-Based Methods for Sentiment Analysis , 2011, CL.

[16]  Jiebo Luo,et al.  Cross-modality Consistent Regression for Joint Visual-Textual Sentiment Analysis of Social Multimedia , 2016, WSDM.

[17]  Johan Bollen,et al.  Twitter mood predicts the stock market , 2010, J. Comput. Sci..

[18]  Wenji Mao,et al.  MultiSentiNet: A Deep Semantic Network for Multimodal Sentiment Analysis , 2017, CIKM.

[19]  Robert Remus,et al.  ASVUniOfLeipzig: Sentiment Analysis in Twitter using Data-driven Machine Learning Techniques , 2013, *SEMEVAL.

[20]  Zhoujun Li,et al.  Bi-Directional Spatial-Semantic Attention Networks for Image-Text Matching , 2019, IEEE Transactions on Image Processing.

[21]  Rongrong Ji,et al.  Large-scale visual sentiment ontology and detectors using adjective noun pairs , 2013, ACM Multimedia.

[22]  Jufeng Yang,et al.  Discovering affective regions in deep convolutional neural networks for visual sentiment prediction , 2016, 2016 IEEE International Conference on Multimedia and Expo (ICME).

[23]  Erik Cambria,et al.  Ensemble application of convolutional neural networks and multiple kernel learning for multimodal sentiment analysis , 2017, Neurocomputing.

[24]  Jiasen Lu,et al.  Hierarchical Question-Image Co-Attention for Visual Question Answering , 2016, NIPS.

[25]  Li-Jia Li,et al.  Visual Sentiment Prediction with Deep Convolutional Neural Networks , 2014, ArXiv.

[26]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[27]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[28]  Shin'ichi Satoh,et al.  Image sentiment analysis using latent correlations among visual, textual, and sentiment views , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[29]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[30]  Jonathon S. Hare,et al.  Analyzing and predicting sentiment of images on the social web , 2010, ACM Multimedia.

[31]  Erik Cambria,et al.  Fusing audio, visual and textual clues for sentiment analysis from multimodal content , 2016, Neurocomputing.

[32]  Juan-Zi Li,et al.  How Do Your Friends on Social Media Disclose Your Emotions? , 2014, AAAI.

[33]  Erik Cambria,et al.  Intention awareness: improving upon situation awareness in human-centric environments , 2013, Human-centric Computing and Information Sciences.

[34]  Trevor Darrell,et al.  Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding , 2016, EMNLP.

[35]  Verónica Pérez-Rosas,et al.  Utterance-Level Multimodal Sentiment Analysis , 2013, ACL.

[36]  Xiangliang Zhang,et al.  An up-to-date comparison of state-of-the-art classification algorithms , 2017, Expert Syst. Appl..

[37]  Jiebo Luo,et al.  Robust Visual-Textual Sentiment Analysis: When Attention meets Tree-structured Recursive Neural Networks , 2016, ACM Multimedia.

[38]  Rongrong Ji,et al.  Microblog Sentiment Analysis Based on Cross-media Bag-of-words Model , 2014, ICIMCS '14.

[39]  Jiebo Luo,et al.  Visual Sentiment Analysis by Attending on Local Image Regions , 2017, AAAI.

[40]  Hamido Fujita,et al.  Multi-Imbalance: An open-source software for multi-class imbalance learning , 2019, Knowl. Based Syst..

[41]  Pushpak Bhattacharyya,et al.  Medical Sentiment Analysis using Social Media: Towards building a Patient Assisted System , 2018, LREC.

[42]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[43]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[44]  Wei Zhang,et al.  Co-attending Free-form Regions and Detections with Multi-modal Multiplicative Feature Embedding for Visual Question Answering , 2017, AAAI.

[45]  Xiaolong Wang,et al.  Topic sentiment analysis in twitter: a graph-based hashtag sentiment classification approach , 2011, CIKM '11.

[46]  Ming Zhou,et al.  Coooolll: A Deep Learning System for Twitter Sentiment Classification , 2014, *SEMEVAL.

[47]  Cícero Nogueira dos Santos,et al.  Deep Convolutional Neural Networks for Sentiment Analysis of Short Texts , 2014, COLING.

[48]  Saif Mohammad,et al.  Sentiment Analysis of Short Informal Texts , 2014, J. Artif. Intell. Res..

[49]  Cornelia Caragea,et al.  Mapping moods: Geo-mapped sentiment analysis during hurricane sandy , 2014, ISCRAM.

[50]  Erik Cambria,et al.  Tensor Fusion Network for Multimodal Sentiment Analysis , 2017, EMNLP.

[51]  Björn W. Schuller,et al.  YouTube Movie Reviews: Sentiment Analysis in an Audio-Visual Context , 2013, IEEE Intelligent Systems.

[52]  Rada Mihalcea,et al.  Towards multimodal sentiment analysis: harvesting opinions from the web , 2011, ICMI '11.

[53]  Christopher Potts,et al.  Learning Word Vectors for Sentiment Analysis , 2011, ACL.

[54]  Wenji Mao,et al.  A residual merged neutral network for multimodal sentiment analysis , 2017, 2017 IEEE 2nd International Conference on Big Data Analysis (ICBDA)(.

[55]  Jiebo Luo,et al.  Robust Image Sentiment Analysis Using Progressively Trained and Domain Transferred Deep Networks , 2015, AAAI.

[56]  Feiran Huang,et al.  Learning Joint Multimodal Representation with Adversarial Attention Networks , 2018, ACM Multimedia.

[57]  Ria Mae Borromeo,et al.  Automatic vs. Crowdsourced Sentiment Analysis , 2015, IDEAS.

[58]  Rongrong Ji,et al.  Sentiment analysis of Chinese micro-blog based on multi-modal correlation model , 2015, 2015 IEEE International Conference on Image Processing (ICIP).

[59]  Chong-Wah Ngo,et al.  Mutlimodal Learning with Deep Boltzmann Machine for Emotion Prediction in User Generated Videos , 2015, ICMR.

[60]  Li Chen,et al.  News impact on stock price return via sentiment analysis , 2014, Knowl. Based Syst..

[61]  Samuel W. K. Chan,et al.  Sentiment analysis in financial texts , 2017, Decis. Support Syst..

[62]  Erik Cambria,et al.  Convolutional MKL Based Multimodal Emotion Recognition and Sentiment Analysis , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[63]  Jiebo Luo,et al.  Sentribute: image sentiment analysis from a mid-level perspective , 2013, WISDOM '13.