A Multi-Modal Dataset for Hate Speech Detection on Social Media: Case-study of Russia-Ukraine Conflict

This paper presents a new multi-modal dataset for identifying hateful content on social media, consisting of 5,680 text-image pairs collected from Twitter, labeled across two labels. Experimental analysis of the presented dataset has shown that understanding both modalities is essential for detecting these techniques. It is confirmed in our experiments with several state-of-the-art multi-modal models. In future work, we plan to extend the dataset in size. We further plan to develop new multi-modal models tailored explicitly to hate-speech detection, aiming for a deeper understanding of the text and image relation. It would also be interesting to perform experiments in a direction that explores what social entities the given hate speech tweet targets.

[1]  Giovanni Da San Martino,et al.  Detecting and Understanding Harmful Memes: A Survey , 2022, IJCAI.

[2]  Gnana Bharathy,et al.  Exploiting linguistic information from Nepali transcripts for early detection of Alzheimer's disease using natural language processing and machine learning techniques , 2021, Int. J. Hum. Comput. Stud..

[3]  Firoj Alam,et al.  Detecting Propaganda Techniques in Memes , 2021, ACL.

[4]  Lanyu Shang,et al.  AOMD: An Analogy-aware Approach to Offensive Meme Detection on Social Media , 2021, Inf. Process. Manag..

[5]  A. Parihar,et al.  Hate Speech Detection Using Natural Language Processing: Applications and Challenges , 2021, 2021 5th International Conference on Trends in Electronics and Informatics (ICOEI).

[6]  Fabrício Benevenuto,et al.  HateBR: A Large Expert Annotated Corpus of Brazilian Instagram Comments for Offensive Language and Hate Speech Detection , 2021, LREC.

[7]  Peter W. Eklund,et al.  COVIDSenti: A Large-Scale Benchmark Twitter Data Set for COVID-19 Sentiment Analysis , 2021, IEEE Transactions on Computational Social Systems.

[8]  Yi Zhou,et al.  Multimodal Learning For Hateful Memes Detection , 2020, 2021 IEEE International Conference on Multimedia & Expo Workshops (ICMEW).

[9]  C. A. Calderón,et al.  Topic Modeling and Characterization of Hate Speech against Immigrants on Twitter around the Emergence of a Far-Right Party in Spain , 2020, Social Sciences.

[10]  Douwe Kiela,et al.  The Hateful Memes Challenge: Detecting Hate Speech in Multimodal Memes , 2020, NeurIPS.

[11]  Cho-Jui Hsieh,et al.  VisualBERT: A Simple and Performant Baseline for Vision and Language , 2019, ArXiv.

[12]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[13]  Björn Gambäck,et al.  Using Convolutional Neural Networks to Classify Hate-Speech , 2017, ALW@ACL.

[14]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Val Hamilton,et al.  The Cambridge Dictionary of Linguistics , 2014 .

[17]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[18]  S. Hochreiter,et al.  Long Short-Term Memory , 1997, Neural Computation.

[19]  Florian Kiesel,et al.  Should I stay or should I go? Stock market reactions to companies' decisions in the wake of the Russia-Ukraine conflict , 2022, SSRN Electronic Journal.

[20]  A. Linear-probe,et al.  Learning Transferable Visual Models From Natural Language Supervision , 2021 .

[21]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[22]  Felice Dell'Orletta,et al.  Hate Me, Hate Me Not: Hate Speech Detection on Facebook , 2017, ITASEC.