Overview of the HASOC Subtrack at FIRE 2021: Conversational Hate Speech Detection in Code-mixed language

This paper presents an overview of the newly developed subtask offered at the Forum for Information Retrieval (FIRE’21) conference on detecting contextual hate in social media conversational dialogue. Identification of Conversational Hate-Speech in Code-Mixed Languages (ICHCL) is offered as subtask-2 of the HASOC-English and Indo-Aryan Languages subtrack under the HASOC main track. The objective of the ICHCL subtask is to filter posts that are normal on a standalone basis but might be judged as hate, profane and offensive posts if we consider the context. This subtask focused on the binary classification of such contextual posts. The dataset is sampled from Twitter. Around 7000 code-mixed posts in English and Hindi were downloaded and annotated with an annotation platform developed for this task. A total of 15 teams from across the world has participated and submitted 50 runs for this track. The Macro F1 score is used as the primary metric for the evaluation. The best-performing team has reported a macro-f1 score of around 0.74. The task shows that considering the context can improve the performance of classification methods. ICHCL can contribute to identifying the best methods for this task.

[1]  Gautam Kishore Shahi,et al.  Overview of the HASOC Subtrack at FIRE 2022: Hate Speech and Offensive Content Identification in English and Indo-Aryan Languages , 2022, FIRE.

[2]  Rajiv Ratn Shah,et al.  Leveraging Transformers for Hate Speech Detection in Conversational Code-Mixed Tweets , 2021, FIRE.

[3]  Prasenjit Majumder,et al.  Overview of the HASOC Subtrack at FIRE 2021: HateSpeech and Offensive Content Identification in English and Indo-Aryan Languages , 2021, FIRE.

[4]  Somnath Banerjee,et al.  Exploring Transformer Based Models to Identify Hate Speech and Offensive Content in English and Indo-Aryan Languages , 2021, FIRE.

[5]  Prasenjit Majumder,et al.  Design and analysis of microblog-based summarization system , 2021, Soc. Netw. Anal. Min..

[6]  Raviraj Joshi,et al.  Contextual Hate Speech Detection in Code Mixed Text using Transformer Based Approaches , 2021, FIRE.

[7]  Prasenjit Majumder,et al.  An empirical evaluation of text representation schemes to filter the social media stream , 2021, J. Exp. Theor. Artif. Intell..

[8]  Sara Tonelli,et al.  Abuse is Contextual, What about NLP? The Role of Context in Abusive Language Annotation and Detection , 2021, ArXiv.

[9]  Prasenjit Majumder,et al.  Detecting and visualizing hate speech in social media: A cyber Watchdog for surveillance , 2020, Expert Syst. Appl..

[10]  Sarah T. Roberts,et al.  Expanding the debate about content moderation: Scholarly research agendas for the coming policy debates , 2020, Internet Policy Rev..

[11]  Viviana Cotik,et al.  A study of Hate Speech in Social Media during the COVID-19 outbreak , 2020 .

[12]  John Pavlopoulos,et al.  Toxicity Detection: Does Context Really Matter? , 2020, ACL.

[13]  Giovanni De Gregorio,et al.  Democratising online content moderation: A constitutional framework , 2020, Comput. Law Secur. Rev..

[14]  Prasenjit Majumder,et al.  Tracking Hate in Social Media: Evaluation, Challenges and Approaches , 2020, SN Computer Science.

[15]  Prasenjit Majumder,et al.  Overview of the HASOC track at FIRE 2020: Hate Speech and Offensive Content Identification in Indo-European Languages , 2021, FIRE.

[16]  Luo Si,et al.  eventAI at SemEval-2019 Task 7: Rumor Detection on Social Media by Exploiting Content, User Credibility and Propagation Information , 2019, *SEMEVAL.

[17]  Prasenjit Majumder,et al.  Filtering Aggression from the Multilingual Social Media Feed , 2018, TRAC@COLING 2018.

[18]  Jatin Sharma,et al.  “I am borrowing ya mixing ?" An Analysis of English-Hindi Code Mixing in Facebook , 2014, CodeSwitch@EMNLP.

[19]  Pelin Canbay,et al.  Hate Speech and Offensive Content Identification with Graph Convolutional Networks , 2021, FIRE.

[20]  Md Saroar Jahan,et al.  Offensive Language Identification Using Hindi-English Code-Mixed Tweets and Code-Mixed Data Augmentation , 2021, FIRE.

[21]  Sukomal Pal,et al.  Fine-tuning Pre-Trained Transformer based model for Hate Speech and Offensive Content Identification in English Indo-Aryan and Code-Mixed (English-Hindi) languages , 2021, FIRE.

[22]  Namita Mittal,et al.  Fine-tune BERT to Classify Hate Speech in Hindi English Code-Mixed Text , 2021, FIRE.

[23]  H. Shashirekha,et al.  Ensemble Based Machine Learning Models for Hate Speech and Offensive Content Identification , 2021, FIRE.