PRHLT-UPV at SemEval-2020 Task 8: Study of Multimodal Techniques for Memes Analysis

This paper describes the system submitted by the PRHLT-UPV team for the task 8 of SemEval2020: Memotion Analysis. We propose a multimodal model that combines pretrained models of the BERT and VGG architectures. The BERT model is used to process the textual information and VGG the images. The multimodal model is used to classify memes according to the presence of offensive, sarcastic, humorous and motivating content. Also, a sentiment analysis of memes is carried out with the proposed model. In the experiments, the model is compared with other approaches to analyze the relevance of the multimodal model. The results show encouraging performances on the final leaderboard of the competition, reaching good positions in the ranking of systems.

[1]  Mohan S. Kankanhalli,et al.  Multimodal fusion for multimedia analysis: a survey , 2010, Multimedia Systems.

[2]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[3]  Erik Cambria,et al.  A review of affective computing: From unimodal analysis to multimodal fusion , 2017, Inf. Fusion.

[4]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Raoul Biagioni The SenticNet Sentiment Lexicon: Exploring Semantic Richness in Multi-Word Concepts , 2016, SpringerBriefs in Cognitive Computation.

[6]  Fenglong Ma,et al.  EANN: Event Adversarial Neural Networks for Multi-Modal Fake News Detection , 2018, KDD.

[7]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[8]  Davide Buscaldi,et al.  From humor recognition to irony detection: The figurative language of social media , 2012, Data Knowl. Eng..

[9]  Silvia Corchs,et al.  Ensemble learning on visual and textual data for social image emotion classification , 2017, International Journal of Machine Learning and Cybernetics.

[10]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[11]  Paolo Rosso,et al.  Irony, Sarcasm, and Sentiment Analysis , 2017 .

[12]  Vijay Vasudevan,et al.  Learning Transferable Architectures for Scalable Image Recognition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[13]  Mohan S. Kankanhalli,et al.  Goal-oriented optimal subset selection of correlated multimedia streams , 2007, TOMCCAP.

[14]  Tanmoy Chakraborty,et al.  SemEval-2020 Task 8: Memotion Analysis- the Visuo-Lingual Metaphor! , 2020, SEMEVAL.