A Feature Extraction Based Model for Hate Speech Identification

The detection of hate speech online has become an important task, as offensive language such as hurtful, obscene and insulting content can harm marginalized people or groups. This paper presents TU Berlin team experiments and results on the task 1A and 1B of the shared task on hate speech and offensive content identification in Indo-European languages 2021. The success of different Natural Language Processing models is evaluated for the respective subtasks throughout the competition. We tested different models based on recurrent neural networks in word and character levels and transfer learning approaches based on Bert on the provided dataset by the competition. Among the tested models that have been used for the experiments, the transfer learning-based models achieved the best results in both subtasks.

[1]  Vasudeva Varma,et al.  Deep Learning for Hate Speech Detection in Tweets , 2017, WWW.

[2]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[3]  Aaron C. Courville,et al.  Ordered Neurons: Integrating Tree Structures into Recurrent Neural Networks , 2018, ICLR.

[4]  Veselin Stoyanov,et al.  Unsupervised Cross-lingual Representation Learning at Scale , 2019, ACL.

[5]  Bharathi Raja Chakravarthi,et al.  Overview of the HASOC Track at FIRE 2020: Hate Speech and Offensive Language Identification in Tamil, Malayalam, Hindi, English and German , 2020, FIRE.

[6]  Prasenjit Majumder,et al.  Overview of the HASOC Subtrack at FIRE 2021: HateSpeech and Offensive Content Identification in English and Indo-Aryan Languages , 2021, FIRE.

[7]  Hongling Li,et al.  YNU_OXZ at HASOC 2020: Multilingual Hate Speech and Offensive Content Identification based on XLM-RoBERTa , 2020, FIRE.

[8]  Iginio Gagliardone,et al.  Mechachal: Online Debates and Elections in Ethiopia - From Hate Speech to Engagement in Social Media , 2016 .

[9]  Atul Kr. Ojha,et al.  ComMA@FIRE 2020: Exploring Multilingual Joint Training across different Classification Tasks , 2020, FIRE.

[10]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[11]  Carlos Ortiz,et al.  Intersectional Bias in Hate Speech and Abusive Language Datasets , 2020, ArXiv.

[12]  Viviana Patti,et al.  Time of Your Hate: The Challenge of Time in Hate Speech Detection on Social Media , 2020, Applied Sciences.

[13]  Tomas Mikolov,et al.  Bag of Tricks for Efficient Text Classification , 2016, EACL.

[14]  Nazli Goharian,et al.  Hate speech detection: Challenges and solutions , 2019, PloS one.

[15]  Hind Saleh Alatawi,et al.  Detecting White Supremacist Hate Speech Using Domain Specific Word Embedding With Deep Learning and BERT , 2020, IEEE Access.

[16]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[17]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.