Attention-Based Modality-Gated Networks for Image-Text Sentiment Analysis

Sentiment analysis of social multimedia data has attracted extensive research interest and has been applied to many tasks, such as election prediction and products evaluation. Sentiment analysis of one modality (e.g., text or image) has been broadly studied. However, not much attention has been paid to the sentiment analysis of multimodal data. Different modalities usually have information that is complementary. Thus, it is necessary to learn the overall sentiment by combining the visual content with text description. In this article, we propose a novel method—Attention-Based Modality-Gated Networks (AMGN)—to exploit the correlation between the modalities of images and texts and extract the discriminative features for multimodal sentiment analysis. Specifically, a visual-semantic attention model is proposed to learn attended visual features for each word. To effectively combine the sentiment information on the two modalities of image and text, a modality-gated LSTM is proposed to learn the multimodal features by adaptively selecting the modality that presents stronger sentiment information. Then a semantic self-attention model is proposed to automatically focus on the discriminative features for sentiment classification. Extensive experiments have been conducted on both manually annotated and machine weakly labeled datasets. The results demonstrate the superiority of our approach through comparison with state-of-the-art models.

[1]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[2]  Jianfeng Dong,et al.  Exploring Human-like Attention Supervision in Visual Question Answering , 2017, AAAI.

[3]  Vibhu O. Mittal,et al.  Comparative Experiments on Sentiment Classification for Online Product Reviews , 2006, AAAI.

[4]  Srikumar Krishnamoorthy,et al.  Sentiment analysis of financial news articles using performance indicators , 2017, Knowledge and Information Systems.

[5]  Feiran Huang,et al.  Image-text sentiment analysis via deep multimodal attentive fusion , 2019, Knowl. Based Syst..

[6]  Ming Zhou,et al.  Coooolll: A Deep Learning System for Twitter Sentiment Classification , 2014, *SEMEVAL.

[7]  Robert Remus,et al.  ASVUniOfLeipzig: Sentiment Analysis in Twitter using Data-driven Machine Learning Techniques , 2013, *SEMEVAL.

[8]  Sen Wang,et al.  Multimodal sentiment analysis with word-level fusion and reinforcement learning , 2017, ICMI.

[9]  Farid Melgani,et al.  Ensemble of Deep Models for Event Recognition , 2018, ACM Trans. Multim. Comput. Commun. Appl..

[10]  Shakeel Ahmad,et al.  T‐SAF: Twitter sentiment analysis framework using a hybrid classification scheme , 2018, Expert Syst. J. Knowl. Eng..

[11]  Haifeng Hu,et al.  Image Captioning via Semantic Guidance Attention and Consensus Selection Strategy , 2018, ACM Trans. Multim. Comput. Commun. Appl..

[12]  Marcel Worring,et al.  Multimodal Popularity Prediction of Brand-related Social Media Posts , 2016, ACM Multimedia.

[13]  Guodong Zhou,et al.  Modeling the Clause-Level Structure to Multimodal Sentiment Analysis via Reinforcement Learning , 2019, 2019 IEEE International Conference on Multimedia and Expo (ICME).

[14]  Jiebo Luo,et al.  Robust Image Sentiment Analysis Using Progressively Trained and Domain Transferred Deep Networks , 2015, AAAI.

[15]  Isabell M. Welpe,et al.  Predicting Elections with Twitter: What 140 Characters Reveal about Political Sentiment , 2010, ICWSM.

[16]  Eni Mustafaraj,et al.  Can Collective Sentiment Expressed on Twitter Predict Political Elections? , 2011, AAAI.

[17]  Jiebo Luo,et al.  Twitter Sentiment Analysis via Bi-sense Emoji Embedding and Attention-based LSTM , 2018, ACM Multimedia.

[18]  Feiran Huang,et al.  Visual-textual sentiment classification with bi-directional multi-level attention networks , 2019, Knowl. Based Syst..

[19]  Rongrong Ji,et al.  Video (GIF) Sentiment Analysis using Large-Scale Mid-Level Ontology , 2015, ArXiv.

[20]  Erik Cambria,et al.  Memory Fusion Network for Multi-view Sequential Learning , 2018, AAAI.

[21]  Jiebo Luo,et al.  Robust Visual-Textual Sentiment Analysis: When Attention meets Tree-structured Recursive Neural Networks , 2016, ACM Multimedia.

[22]  Senzhang Wang,et al.  Aspect-Based Sentiment Classification with Attentive Neural Turing Machines , 2019, IJCAI.

[23]  Lei Zhang,et al.  Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[24]  Verónica Pérez-Rosas,et al.  Utterance-Level Multimodal Sentiment Analysis , 2013, ACL.

[25]  Sheng Tang,et al.  Image Caption with Global-Local Attention , 2017, AAAI.

[26]  Erik Cambria,et al.  Deep Convolutional Neural Network Textual Features and Multiple Kernel Learning for Utterance-level Multimodal Sentiment Analysis , 2015, EMNLP.

[27]  Zhoujun Li,et al.  Diabetes-Associated Factors as Predictors of Nursing Home Admission and Costs in the Elderly Across Europe. , 2017, Journal of the American Medical Directors Association.

[28]  Li Chen,et al.  News impact on stock price return via sentiment analysis , 2014, Knowl. Based Syst..

[29]  Quoc-Tuan Truong,et al.  VistaNet: Visual Aspect Attention Network for Multimodal Sentiment Analysis , 2019, AAAI.

[30]  Jiebo Luo,et al.  The wisdom of social multimedia: using flickr for prediction and forecast , 2010, ACM Multimedia.

[31]  Jiebo Luo,et al.  Visual Sentiment Analysis by Attending on Local Image Regions , 2017, AAAI.

[32]  Chen Wang,et al.  DeepCU: Integrating both Common and Unique Latent Information for Multimodal Sentiment Analysis , 2019, IJCAI.

[33]  Jiebo Luo,et al.  Cross-modality Consistent Regression for Joint Visual-Textual Sentiment Analysis of Social Multimedia , 2016, WSDM.

[34]  Rongrong Ji,et al.  Large-scale visual sentiment ontology and detectors using adjective noun pairs , 2013, ACM Multimedia.

[35]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[36]  Jon Atle Gulla,et al.  Sentiment Learning on Product Reviews via Sentiment Ontology Tree , 2010, ACL.

[37]  Shrikanth S. Narayanan,et al.  A System for Real-time Twitter Sentiment Analysis of 2012 U.S. Presidential Election Cycle , 2012, ACL.

[38]  Rada Mihalcea,et al.  Towards multimodal sentiment analysis: harvesting opinions from the web , 2011, ICMI '11.

[39]  Richard Socher,et al.  Knowing When to Look: Adaptive Attention via a Visual Sentinel for Image Captioning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Emmanuel Dellandréa,et al.  Multimodal recognition of visual concepts using histograms of textual concepts and selective weighted late fusion scheme , 2013, Comput. Vis. Image Underst..

[41]  Ying Zhang,et al.  Text Emotion Distribution Learning via Multi-Task Convolutional Neural Network , 2018, IJCAI.

[42]  Wenji Mao,et al.  MultiSentiNet: A Deep Semantic Network for Multimodal Sentiment Analysis , 2017, CIKM.

[43]  Mingliang Chen,et al.  Building emotional dictionary for sentiment analysis of online news , 2014, World Wide Web.

[44]  Quoc-Tuan Truong,et al.  Visual Sentiment Analysis for Review Images with Item-Oriented and User-Oriented CNN , 2017, ACM Multimedia.

[45]  Eric Gilbert,et al.  VADER: A Parsimonious Rule-Based Model for Sentiment Analysis of Social Media Text , 2014, ICWSM.

[46]  Seth Flaxman,et al.  Multimodal Sentiment Analysis To Explore the Structure of Emotions , 2018, KDD.

[47]  Bohyung Han,et al.  Text-Guided Attention Model for Image Captioning , 2016, AAAI.

[48]  Yongdong Zhang,et al.  Convolutional Attention Networks for Scene Text Recognition , 2019, ACM Trans. Multim. Comput. Commun. Appl..

[49]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[50]  Feiran Huang,et al.  Sentiment analysis of social images via hierarchical deep fusion of content and links , 2019, Appl. Soft Comput..

[51]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[52]  Raymond Chiong,et al.  A sentiment analysis-based machine learning approach for financial market prediction via news disclosures , 2018, GECCO.

[53]  Erik Cambria,et al.  Multi-level Multiple Attentions for Contextual Multimodal Sentiment Analysis , 2017, 2017 IEEE International Conference on Data Mining (ICDM).

[54]  Wenji Mao,et al.  Multi-Interactive Memory Network for Aspect Based Multimodal Sentiment Analysis , 2019, AAAI.

[55]  Liujuan Cao,et al.  Cross-Modality Microblog Sentiment Prediction via Bi-Layer Multimodal Hypergraph Learning , 2019, IEEE Transactions on Multimedia.

[56]  Xiaolong Wang,et al.  Topic sentiment analysis in twitter: a graph-based hashtag sentiment classification approach , 2011, CIKM '11.

[57]  Alessandro Moschitti,et al.  Twitter Sentiment Analysis with Deep Convolutional Neural Networks , 2015, SIGIR.

[58]  Erik Cambria,et al.  Convolutional MKL Based Multimodal Emotion Recognition and Sentiment Analysis , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[59]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[60]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[61]  Emily Mower Provost,et al.  Emotion Recognition During Speech Using Dynamics of Multiple Regions of the Face , 2015, ACM Trans. Multim. Comput. Commun. Appl..

[62]  Steven Skiena,et al.  Large-Scale Sentiment Analysis for News and Blogs (system demonstration) , 2007, ICWSM.

[63]  Zhoujun Li,et al.  Adversarial Learning for Weakly-Supervised Social Network Alignment , 2019, AAAI.

[64]  Li-Jia Li,et al.  Visual Sentiment Prediction with Deep Convolutional Neural Networks , 2014, ArXiv.