CSECU-DSG at SemEval-2021 Task 6: Orchestrating Multimodal Neural Architectures for Identifying Persuasion Techniques in Texts and Images

Inscribing persuasion techniques in memes is the most impactful way to influence peoples’ mindsets. People are more inclined to memes as they are more stimulating and convincing and hence memes are often exploited by tactfully engraving propaganda in its context with the intent of attaining specific agenda. This paper describes our participation in the three subtasks featured by SemEval 2021 task 6 on the detection of persuasion techniques in texts and images. We utilize a fusion of logistic regression, decision tree, and fine-tuned DistilBERT for tackling subtask 1. As for subtask 2, we propose a system that consolidates a span identification model and a multi-label classification model based on pre-trained BERT. We address the multi-modal multi-label classification of memes defined in subtask 3 by utilizing a ResNet50 based image model, DistilBERT based text model, and a multi-modal architecture based on multikernel CNN+LSTM and MLP model. The outcomes illustrated the competitive performance of our systems.

[1]  David A. Landgrebe,et al.  A survey of decision tree classifier methodology , 1991, IEEE Trans. Syst. Man Cybern..

[2]  Shiping Wen,et al.  Multilabel Image Classification via Feature/Label Co-Projection , 2021, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[3]  Apostol Natsev,et al.  YouTube-8M: A Large-Scale Video Classification Benchmark , 2016, ArXiv.

[4]  Thomas Wolf,et al.  DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter , 2019, ArXiv.

[5]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[6]  Peng Zhou,et al.  FastBERT: a Self-distilling BERT with Adaptive Inference Time , 2020, ACL.

[7]  Hanyin Wang,et al.  A Comparison of Pre-trained Vision-and-Language Models for Multimodal Representation Learning across Medical Images and Reports , 2020, 2020 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[8]  Hiroaki Ozaki,et al.  Hitachi at SemEval-2020 Task 11: An Empirical Study of Pre-Trained Transformer Family for Propaganda Detection , 2020, SEMEVAL.

[9]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Eyke Hüllermeier,et al.  Combining instance-based learning and logistic regression for multilabel classification , 2009, Machine Learning.

[11]  Mingxuan Sun,et al.  A Deep Multi-Modal CNN for Multi-Instance Multi-Label Image Classification , 2018, IEEE Transactions on Image Processing.

[12]  Serge J. Belongie,et al.  The iMet Collection 2019 Challenge Dataset , 2019, ArXiv.

[13]  Steven Bird,et al.  NLTK: The Natural Language Toolkit , 2002, ACL.

[14]  Giovanni Da San Martino,et al.  SemEval-2020 Task 11: Detection of Propaganda Techniques in News Articles , 2020, SEMEVAL.

[15]  Ion Androutsopoulos,et al.  Large-Scale Multi-Label Text Classification on EU Legislation , 2019, ACL.

[16]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[17]  Donghong Ji,et al.  Latent Emotion Memory for Multi-Label Emotion Classification , 2020, AAAI.

[18]  Cho-Jui Hsieh,et al.  VisualBERT: A Simple and Performant Baseline for Vision and Language , 2019, ArXiv.

[19]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.