Exercise? I thought you said 'Extra Fries': Leveraging Sentence Demarcations and Multi-hop Attention for Meme Affect Analysis

Today’s Internet is awash in memes as they are humorous, satirical, or ironic which make people laugh. According to a survey, 33% of social media users in age bracket [13 − 35] send memes every day, whereas more than 50% send every week. Some of these memes spread rapidly within a very short time-frame, and their virality depends on the novelty of their (textual and visual) content. A few of them convey positive messages, such as funny or motivational quotes; while others are meant to mock/hurt someone’s feelings through sarcastic or offensive messages. Despite the appealing nature of memes and their rapid emergence on social media, effective analysis of memes has not been adequately attempted to the extent it deserves. Recently, in SemEval’20, a pioneering attempt has been made in this direction by organizing a shared task on ‘Memotion Analysis’ (meme emotion analysis). As expected, the competition attracted more than 500 participants with the final submission of [23 − 32] systems across three sub-tasks. In this paper, we attempt to solve the same set of tasks suggested in the SemEval’20-Memotion Analysis competition. We propose a multi-hop attention-based deep neural network framework, called MHA-Meme, whose prime objective is to leverage the spatial-domain correspondence between the visual modality (an image) and various textual segments to extract fine-grained feature representations for classification. We evaluate MHA-Meme on the ‘Memotion Analysis’ dataset for all three sub-tasks sentiment classification, affect classification, and affect class quantification. Our comparative study shows state-of-the-art performances of MHA-Meme for all three tasks compared to the top systems that participated in the competition. Unlike all the baselines which perform inconsistently across all three tasks, MHA-Meme outperforms baselines in all the tasks on average. Moreover, we validate the generalization of MHA-Meme on another set of manually annotated test samples and observe it to be consistent. Finally, we establish the interpretability of MHA-Meme.

[1]  Costin-Gabriel Chiru,et al.  UPB at SemEval-2020 Task 8: Joint Textual and Visual Modeling in a Multi-Task Learning Architecture for Memotion Analysis , 2020, SEMEVAL.

[2]  Jiebo Luo,et al.  Cross-modality Consistent Regression for Joint Visual-Textual Sentiment Analysis of Social Multimedia , 2016, WSDM.

[3]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[4]  Ilanthenral Kandasamy,et al.  Memebusters at SemEval-2020 Task 8: Feature Fusion Model for Sentiment Analysis on Memes Using Transfer Learning , 2020, SEMEVAL.

[5]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[7]  Erik Cambria,et al.  Convolutional MKL Based Multimodal Emotion Recognition and Sentiment Analysis , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[8]  Dahua Lin,et al.  PolyNet: A Pursuit of Structural Diversity in Very Deep Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Louis-Philippe Morency,et al.  M-BERT: Injecting Multimodal Information in the BERT Structure , 2019, ArXiv.

[10]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[11]  Jiebo Luo,et al.  Visual Sentiment Analysis by Attending on Local Image Regions , 2017, AAAI.

[12]  Erik Cambria,et al.  A review of affective computing: From unimodal analysis to multimodal fusion , 2017, Inf. Fusion.

[13]  Pushpak Bhattacharyya,et al.  Feature selection and ensemble construction: A two-step method for aspect based sentiment analysis , 2017, Knowl. Based Syst..

[14]  Peng Liu,et al.  Dynamic attention-based explainable recommendation with textual and visual fusion , 2020, Inf. Process. Manag..

[15]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[16]  Paolo Rosso,et al.  PRHLT-UPV at SemEval-2020 Task 8: Study of Multimodal Techniques for Memes Analysis , 2020, SemEval@COLING.

[17]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[18]  Ivan Marsic,et al.  Hybrid Attention based Multimodal Network for Spoken Language Classification , 2018, COLING.

[19]  Xiaolong Wang,et al.  Topic sentiment analysis in twitter: a graph-based hashtag sentiment classification approach , 2011, CIKM '11.

[20]  Sergey Ioffe,et al.  Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.

[21]  Xi Chen,et al.  Stacked Cross Attention for Image-Text Matching , 2018, ECCV.

[22]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[23]  Rongrong Ji,et al.  A cross-media public sentiment analysis system for microblog , 2014, Multimedia Systems.

[24]  Masaki Aono,et al.  CSECU_KDE_MA at SemEval-2020 Task 8: A Neural Attention Model for Memotion Analysis , 2020, SEMEVAL.

[25]  Erik Cambria,et al.  Memory Fusion Network for Multi-view Sequential Learning , 2018, AAAI.

[26]  Emily Mower Provost,et al.  Pooling acoustic and lexical features for the prediction of valence , 2017, ICMI.

[27]  Ashutosh Modi,et al.  IITK at SemEval-2020 Task 8: Unimodal and Bimodal Sentiment Analysis of Internet Memes , 2020, SEMEVAL.

[28]  Vipul Mishra,et al.  BennettNLP at SemEval-2020 Task 8: Multimodal sentiment classification Using Hybrid Hierarchical Classifier , 2020, SEMEVAL.

[29]  Richard Socher,et al.  Knowing When to Look: Adaptive Attention via a Visual Sentinel for Image Captioning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Erik Cambria,et al.  Tensor Fusion Network for Multimodal Sentiment Analysis , 2017, EMNLP.

[31]  Li Zhao,et al.  Attention-based LSTM for Aspect-level Sentiment Classification , 2016, EMNLP.

[32]  Erik Cambria,et al.  Multi-attention Recurrent Network for Human Communication Comprehension , 2018, AAAI.

[33]  Qingming Huang,et al.  Affective Image Content Analysis: A Comprehensive Survey , 2018, IJCAI.

[34]  Erik Cambria,et al.  Multi-level Multiple Attentions for Contextual Multimodal Sentiment Analysis , 2017, 2017 IEEE International Conference on Data Mining (ICDM).

[35]  Weitong Chen,et al.  A survey of sentiment analysis in social media , 2018, Knowledge and Information Systems.

[36]  Arkaitz Zubiaga,et al.  NUAA-QMUL at SemEval-2020 Task 8: Utilizing BERT and DenseNet for Internet Meme Emotion Analysis , 2020, SEMEVAL.

[37]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[38]  Pushpak Bhattacharyya,et al.  Multi-task Learning for Multi-modal Emotion Recognition and Sentiment Analysis , 2019, NAACL.

[39]  Mingxing Xu,et al.  Guoym at SemEval-2020 Task 8: Ensemble-based Classification of Visuo-Lingual Metaphor in Memes , 2020, SEMEVAL.

[40]  Tanmoy Chakraborty,et al.  SemEval-2020 Task 8: Memotion Analysis- the Visuo-Lingual Metaphor! , 2020, SEMEVAL.

[41]  Urszula Walińska,et al.  Urszula Walińska at SemEval-2020 Task 8: Fusion of Text and Image Features Using LSTM and VGG16 for Memotion Analysis , 2020, SEMEVAL.

[42]  Erik Cambria,et al.  Affective Computing and Sentiment Analysis , 2016, IEEE Intelligent Systems.

[43]  Hiroaki Ozaki,et al.  Hitachi at SemEval-2020 Task 8: Simple but Effective Modality Ensemble for Meme Emotion Recognition , 2020, SEMEVAL.

[44]  Bowen Zhou,et al.  A Structured Self-attentive Sentence Embedding , 2017, ICLR.

[45]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[46]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Li Fei-Fei,et al.  Progressive Neural Architecture Search , 2017, ECCV.