An Ensemble Approach for Hate and Offensive Language Identification in English and Indo-Aryan Languages

The freedom to upload and the lack of effective social media monitoring have resulted in a slew of societal issues such as cyberbullying, offensive content, and hate speech. Due to this, identifying hate and abusive language on social media is one of the trendiest research topics these days. This work proposes an ensemble-based model for detecting hate and offensive language in English and Hindi social media postings, which combines a support vector machine, logistic regression, random forest, gradient boosting, and Adaboost classifiers. The use of word-level n-gram features performed significantly well in the English dataset, with macro 𝐹 1 -scores of 0.79 and 0.59 for two different tasks, while character-level n-gram features performed significantly well in the Hindi dataset, with macro 𝐹 1 -scores of 0.75 and 0.47 for two different tasks.

[1]  Gautam Kishore Shahi,et al.  Overview of the HASOC Subtrack at FIRE 2022: Hate Speech and Offensive Content Identification in English and Indo-Aryan Languages , 2022, FIRE.

[2]  Prasenjit Majumder,et al.  Overview of the HASOC Subtrack at FIRE 2021: HateSpeech and Offensive Content Identification in English and Indo-Aryan Languages , 2021, FIRE.

[3]  Abhinav Kumar,et al.  Deep Ensemble Approach for COVID-19 Fake News Detection from Social Media , 2021, 2021 8th International Conference on Signal Processing and Integrated Networks (SPIN).

[4]  Jyoti Prakash Singh,et al.  A Deep Multi-modal Neural Network for the Identification of Hate Speech from Social Media , 2021, I3E.

[5]  Pablo Pallarés,et al.  Local Classification with Recurrent Neural Network for Profiling Hate Speech Spreaders on Twitter. Notebook for PAN at CLEF 2021 , 2021, CLEF.

[6]  J. Singh,et al.  Offensive language identification in Dravidian code mixed social media text , 2021, DRAVIDIANLANGTECH.

[7]  J. Singh,et al.  NITP-AI-NLP@HASOC-Dravidian-CodeMix-FIRE2020: A Machine Learning Approach to Identify Offensive Languages from Dravidian Code-Mixed Text , 2020, FIRE.

[8]  J. Singh,et al.  NITP-AI-NLP@HASOC-FIRE2020: Fine Tuned BERT for the Hate Speech and Offensive Content Identification from Social Media , 2020, FIRE.

[9]  Sunil Saumya,et al.  IIIT_DWD@HASOC 2020: Identifying offensive content in Indo-European languages , 2020, FIRE.

[10]  J. Singh,et al.  Disaster Severity Prediction from Twitter Images , 2020, Intelligence Enabled Research.

[11]  Pradeep Kumar Roy,et al.  A Framework for Hate Speech Detection Using Deep Convolutional Neural Network , 2020, IEEE Access.

[12]  Prasenjit Majumder,et al.  Overview of the HASOC track at FIRE 2019: Hate Speech and Offensive Content Identification in Indo-European Languages , 2019, FIRE.

[13]  Jyoti Prakash Singh,et al.  A Comparative Analysis of Machine Learning Techniques for Disaster-Related Tweet Classification , 2019, 2019 IEEE R10 Humanitarian Technology Conference (R10-HTC)(47129).

[14]  Dipti Misra Sharma,et al.  IIIT-Hyderabad at HASOC 2019: Hate Speech Detection , 2019, FIRE.

[15]  Akanksha Mishra,et al.  IIT Varanasi at HASOC 2019: Hate Speech and Offensive Content Identification in Indo-European Languages , 2019, FIRE.

[16]  Jyoti Prakash Singh,et al.  AI ML NIT Patna at HASOC 2019: Deep Learning Approach for Identification of Abusive Content , 2019, FIRE.

[17]  Fabrício Benevenuto,et al.  A Measurement Study of Hate Speech in Social Media , 2017, HT.

[18]  Vasudeva Varma,et al.  Deep Learning for Hate Speech Detection in Tweets , 2017, WWW.

[19]  Matthew Leighton Williams,et al.  Cyber Hate Speech on Twitter: An Application of Machine Classification and Statistical Modeling for Policy and Decision Making , 2015 .