Overview of the HASOC Subtrack at FIRE 2021: HateSpeech and Offensive Content Identification in English and Indo-Aryan Languages

The HASOC track is dedicated to the evaluation of technology for finding Offensive Language and Hate Speech. HASOC is creating a multilingual data corpus mainly for English and under-resourced languages(Hindi and Marathi). This paper presents one HASOC subtrack with two tasks. In 2021, we organized the classification task for English, Hindi, and Marathi. The first task consists of two classification tasks; Subtask 1A consists of a binary and fine-grained classification into offensive and non-offensive tweets. Subtask 1B asks to classify the tweets into Hate, Profane and offensive. Task 2 consists of identifying tweets given additional context in the form of the preceding conversion. During the shared task, 65 teams have submitted 652 runs. This overview paper briefly presents the task descriptions, the data and the results obtained from the participant’s submission.

[1]  Gautam Kishore Shahi,et al.  Overview of the HASOC Subtrack at FIRE 2022: Hate Speech and Offensive Content Identification in English and Indo-Aryan Languages , 2022, FIRE.

[2]  Arka Mitra,et al.  Multilingual Hate Speech and Offensive Content Detection using Modified Cross-entropy Loss , 2022, FIRE.

[3]  Yves Bestgen A Simple Language-Agnostic yet Strong Baseline System for Hate Speech and Offensive Content Identification , 2022, FIRE.

[4]  Salar Mohtaj,et al.  A Feature Extraction Based Model for Hate Speech Identification , 2022, FIRE.

[5]  Rajiv Ratn Shah,et al.  Leveraging Transformers for Hate Speech Detection in Conversational Code-Mixed Tweets , 2021, FIRE.

[6]  Ralph Ewerth,et al.  Combining Textual Features for the Detection of Hateful and Offensive Language , 2021, FIRE.

[7]  Somnath Banerjee,et al.  Exploring Transformer Based Models to Identify Hate Speech and Offensive Content in English and Indo-Aryan Languages , 2021, FIRE.

[8]  Prasenjit Majumder,et al.  Design and analysis of microblog-based summarization system , 2021, Soc. Netw. Anal. Min..

[9]  Paolo Rosso,et al.  Detecting ethnicity-targeted hate speech in Russian social media texts , 2021, Inf. Process. Manag..

[10]  Anna Glazkova,et al.  Fine-tuning of Pre-trained Transformers for Hate Offensive and Profane Content Detection in English and Marathi , 2021, FIRE.

[11]  Ponnurangam Kumaraguru,et al.  Battling Hateful Content in Indic Languages HASOC'21 , 2021, FIRE.

[12]  Raviraj Joshi,et al.  Hate and Offensive Speech Detection in Hindi and Marathi , 2021, FIRE.

[13]  Kumar Shridhar,et al.  One to Rule Them All: Towards Joint Indic Language Hate Speech Detection , 2021, FIRE.

[14]  Alberto Barrón-Cedeño,et al.  The CLEF-2021 CheckThat! Lab on Detecting Check-Worthy Claims, Previously Fact-Checked Claims, and Fake News , 2021, ECIR.

[15]  Marcos Zampieri,et al.  Cross-lingual Offensive Language Identification for Low Resource Languages: The Case of Marathi , 2021, RANLP.

[16]  Hugo Jair Escalante,et al.  Overview of MeOffendEs at IberLEF 2021: Offensive Language Detection in Spanish Variants , 2021, Proces. del Leng. Natural.

[17]  Prasenjit Majumder,et al.  Overview of the HASOC track at FIRE 2020: Hate Speech and Offensive Content Identification in Indo-European Languages , 2021, FIRE.

[18]  Marcos Zampieri,et al.  An Evaluation of Multilingual Offensive Language Identification Methods for the Languages of India , 2021, Inf..

[19]  Marcos Zampieri,et al.  Multilingual Offensive Language Identification for Low-resource Languages , 2021, ACM Trans. Asian Low Resour. Lang. Inf. Process..

[20]  Arkaitz Zubiaga,et al.  Towards generalisable hate speech detection: a review on obstacles and solutions , 2021, PeerJ Comput. Sci..

[21]  Miljana Mladenović,et al.  Cyber-aggression, Cyberbullying, and Cyber-grooming , 2021, ACM Comput. Surv..

[22]  Bharathi Raja Chakravarthi,et al.  Overview of the HASOC Track at FIRE 2020: Hate Speech and Offensive Language Identification in Tamil, Malayalam, Hindi, English and German , 2020, FIRE.

[23]  Prasenjit Majumder,et al.  Detecting and visualizing hate speech in social media: A cyber Watchdog for surveillance , 2020, Expert Syst. Appl..

[24]  Georg Groh,et al.  Identifying and Measuring Annotator Bias Based on Annotators’ Demographic Characteristics , 2020, ALW.

[25]  Marcos Zampieri,et al.  Multilingual Offensive Language Identification with Cross-lingual Embeddings , 2020, EMNLP.

[26]  Gautam Kishore Shahi AMUSED: An Annotation Framework of Multi-modal Social Media Data , 2020, ArXiv.

[27]  Daniel Matthew Cer,et al.  Language-agnostic BERT Sentence Embedding , 2020, ACL.

[28]  Durgesh Nandini,et al.  FakeCovid - A Multilingual Cross-domain Fact Check News Dataset for COVID-19 , 2020, ICWSM Workshops.

[29]  Tim A. Majchrzak,et al.  An exploratory study of COVID-19 misinformation on Twitter , 2020, Online Social Networks and Media.

[30]  Indrayuda Indrayuda,et al.  TOXIC , 2020, Jurnal Sendratasik.

[31]  Paula Fortuna,et al.  Toxic, Hateful, Offensive or Abusive? What Are We Really Classifying? An Empirical Analysis of Hate Speech Datasets , 2020, LREC.

[32]  Shervin Malmasi,et al.  Evaluating Aggression Identification in Social Media , 2020, TRAC.

[33]  Çağrı Çöltekin,et al.  A Corpus of Turkish Offensive Language on Social Media , 2020, LREC.

[34]  Preslav Nakov,et al.  SOLID: A Large-Scale Semi-Supervised Dataset for Offensive Language Identification , 2020, FINDINGS.

[35]  Leon Derczynski,et al.  Directions in abusive language training data, a systematic review: Garbage in, garbage out , 2020, PloS one.

[36]  Leon Derczynski,et al.  Directions in Abusive Language Training Data: Garbage In, Garbage Out , 2020, ArXiv.

[37]  Marcos Zampieri,et al.  Offensive Language Identification in Greek , 2020, LREC.

[38]  Gudbjartur Ingi Sigurbergsson,et al.  Offensive Language and Hate Speech Detection for Danish , 2019, LREC.

[39]  Rudresh Panchal,et al.  Online hatred of women in the Incels.me forum , 2019, Journal of Language Aggression and Conflict.

[40]  Yejin Choi,et al.  The Risk of Racial Bias in Hate Speech Detection , 2019, ACL.

[41]  Vasudeva Varma,et al.  FERMI at SemEval-2019 Task 5: Using Sentence embeddings to Identify Hate Speech Against Immigrants and Women in Twitter , 2019, *SEMEVAL.

[42]  Preslav Nakov,et al.  SemEval-2019 Task 6: Identifying and Categorizing Offensive Language in Social Media (OffensEval) , 2019, *SEMEVAL.

[43]  Bernard J. Jansen,et al.  Online Hate Ratings Vary by Extremes: A Statistical Analysis , 2019, CHIIR.

[44]  Preslav Nakov,et al.  Predicting the Type and Target of Offensive Posts in Social Media , 2019, NAACL.

[45]  Nikola K. Kasabov,et al.  Modelling and Analysis of Temporal Gene Expression Data Using Spiking Neural Networks , 2018, ICONIP.

[46]  Nikola K. Kasabov,et al.  Analysis, Classification and Marker Discovery of Gene Expression Data with Evolving Spiking Neural Networks , 2018, ICONIP.

[47]  Muhammad Al-Qurishi,et al.  Online Extremism Detection in Textual Content: A Systematic Literature Review , 2021, IEEE Access.

[48]  John Pavlopoulos,et al.  SemEval-2021 Task 5: Toxic Spans Detection , 2021, SEMEVAL.

[49]  Liviu P. Dinu,et al.  A Computational Exploration of Pejorative Language in Social Media , 2021, EMNLP.

[50]  Julio Gonzalo,et al.  IberLEF 2021 Overview: Natural Language Processing for Iberian Languages , 2021, IberLEF@SEPLN.

[51]  Alina Wan,et al.  Identification of Profane Words in Cyberbullying Incidents within Social Networks , 2021 .

[52]  Paula Fortuna,et al.  How well do hate speech, toxicity, abusive and offensive language classification models generalize across datasets? , 2021, Inf. Process. Manag..

[53]  Nishanth R. Sastry,et al.  An Expert Annotated Dataset for the Detection of Online Misogyny , 2021, EACL.

[54]  S. Saha,et al.  Attention Based BERT-FastText Model for Hate Speech and Offensive Content Identification in English and Hindi Languages , 2021, FIRE.

[55]  Lipika Dey,et al.  Gated Multi-task learning framework for text classification , 2021, Fire.

[56]  Md Saroar Jahan,et al.  Hate and Offensive Language Detection using BERT for English Subtask A , 2021, Fire.

[57]  Yaakov Hacohen-Kerner,et al.  Detecting Offensive Language in English Hindi and Marathi using Classical Supervised Machine Learning Methods and Word/Char N-grams , 2021, FIRE.

[58]  V. Sreedhar,et al.  Classification of Hate Speech and Offensive Content using an approach based on DistilBERT , 2021, FIRE.

[59]  Thenmozhi Durairaj,et al.  Multilingual Hate Speech and Offensive Language Detection in English Hindi and Marathi languages , 2021, FIRE.

[60]  M. A. Kumar,et al.  Hate Speech and Offensive Content Identification in Hindi and Marathi Language Tweets using Ensemble Techniques , 2021, Fire.

[61]  Benedikt T. Boenninghoff,et al.  Hybrid Representation Fusion for Twitter Hate Speech Identification , 2021, FIRE.

[62]  Gábor Recski,et al.  Offensive Text Detection on English Twitter with Deep Learning Models and Rule-Based Systems , 2021, FIRE.

[63]  P. Roy,et al.  An Ensemble Approach for Hate and Offensive Language Identification in English and Indo-Aryan Languages , 2021, FIRE.

[64]  B. Bharathi,et al.  Machine Learning Based Hate Speech Identification for English and Indo-Aryan Languages , 2021, FIRE.

[65]  Aakash Ambalavanan,et al.  Hate Speech Detection using LIME guided Ensemble Method and DistilBERT , 2021, Fire.

[66]  A. Pawar,et al.  Machine Learning Models for Hate Speech and Offensive Language Identification for Indo-Aryan Language: Hindi , 2021, FIRE.

[67]  Yifan Xu,et al.  Hate Speech and Offensive Content Identification Based on Self-Attention , 2021, FIRE.

[68]  Sukomal Pal,et al.  Fine-tuning Pre-Trained Transformer based model for Hate Speech and Offensive Content Identification in English Indo-Aryan and Code-Mixed (English-Hindi) languages , 2021, FIRE.

[69]  D. Ognibene,et al.  biCourage: ngram and syntax GCNs for Hate Speech detection , 2021, Fire.

[70]  Jelena Mitrović,et al.  Hatespeech and Offensive Content Detection in Hindi Language using C-BiGRU , 2021, Fire.

[71]  Dipankar Das,et al.  Detection of Hate or Offensive Phrase using Magnified Tf-Idf , 2021, Fire.

[72]  Namita Mittal,et al.  Fine-tune BERT to Classify Hate Speech in Hindi English Code-Mixed Text , 2021, FIRE.

[73]  Surya Agustian,et al.  Feature Selection with Pretrained-BERT for Hate Speech and Offensive Content Identification in English and Hindi Languages , 2021, Fire.

[74]  R. Valencia-García,et al.  Detecting Hate Speech on English and Indo-Aryan Languages with BERT and Ensemble learning , 2021, Fire.

[75]  Edwin Thuma,et al.  Leveraging Text Generated from Emojis for Hate Speech and Offensive Content Identification , 2021, FIRE.

[76]  Hao Wu,et al.  ALBERT for Hate Speech and Offensive Content Identification , 2021, FIRE.

[77]  Md Saroar Jahan,et al.  Offensive Language Identification Using Hindi-English Code-Mixed Tweets and Code-Mixed Data Augmentation , 2021, FIRE.

[78]  Deptii D. Chaudhari,et al.  Hate and Offensive Speech Detection in Hindi Twitter Corpus , 2021, FIRE.

[79]  R. Pamula,et al.  Hate Speech and Offensive Content Identification in English Tweets , 2021, FIRE.

[80]  Abhijeet S. Kale,et al.  Machine Learning Models for Hate Speech Identification in Marathi Language , 2021, Fire.

[81]  Yongyi Kui Detect Hate and Offensive Content in English and Indo-Aryan Languages based on Transformer , 2021, Fire.

[82]  Pelin Canbay,et al.  Hate Speech and Offensive Content Identification with Graph Convolutional Networks , 2021, FIRE.

[83]  Prasenjit Majumder,et al.  Overview of the HASOC Subtrack at FIRE 2021: Conversational Hate Speech Detection in Code-mixed language , 2021, FIRE.

[84]  H. Shashirekha,et al.  Ensemble Based Machine Learning Models for Hate Speech and Offensive Content Identification , 2021, FIRE.

[85]  Marcos Zampieri,et al.  Transformer Models for Offensive Language Identification in Marathi , 2021, FIRE.

[86]  Jyothi Shetty,et al.  Cyber-Bullying Detection: A Comparative Analysis of Twitter Data , 2020 .

[87]  Marcos Zampieri,et al.  BRUMS at HASOC 2019: Deep Learning Models for Multilingual Hate Speech and Offensive Language Identification , 2019, FIRE.

[88]  Hugo Jair Escalante,et al.  Overview of MEX-A3T at IberLEF 2019: Authorship and Aggressiveness Analysis in Mexican Spanish Tweets , 2018, IberLEF@SEPLN.

[89]  Marion Kee,et al.  Analysis , 2004, Machine Translation.