Advances in Machine Learning Algorithms for Hate Speech Detection in Social Media: A Review

The aim of this paper is to review machine learning (ML) algorithms and techniques for hate speech detection in social media (SM). Hate speech problem is normally model as a text classification task. In this study, we examined the basic baseline components of hate speech classification using ML algorithms. There are five basic baseline components – data collection and exploration, feature extraction, dimensionality reduction, classifier selection and training, and model evaluation, were reviewed. There have been improvements in ML algorithms that were employed for hate speech detection over time. New datasets and different performance metrics have been proposed in the literature. To keep the researchers informed regarding these trends in the automatic detection of hate speech, it calls for a comprehensive and an updated state-of-the-art. The contributions of this study are three-fold. First to equip the readers with the necessary information on the critical steps involved in hate speech detection using ML algorithms. Secondly, the weaknesses and strengths of each method is critically evaluated to guide researchers in the algorithm choice dilemma. Lastly, some research gaps and open challenges were identified. The different variants of ML techniques were reviewed which include classical ML, ensemble approach and deep learning methods. Researchers and professionals alike will benefit immensely from this study.

[1]  Leonidas Aristodemou,et al.  The state-of-the-art on Intellectual Property Analytics (IPA): A literature review on artificial intelligence, machine learning and deep learning methods for analysing intellectual property (IP) data , 2018, World Patent Information.

[2]  Amit Kumar Das,et al.  Fake News Detection in Social Media using Blockchain , 2019, 2019 7th International Conference on Smart Computing & Communications (ICSCC).

[3]  Sheetal Rathi,et al.  Comprehensive Survey on Deep Learning Approaches in Predictive Business Process Monitoring , 2020 .

[4]  Matthew Leighton Williams,et al.  Cyber Hate Speech on Twitter: An Application of Machine Classification and Statistical Modeling for Policy and Decision Making , 2015 .

[5]  Nazli Goharian,et al.  Hate speech detection: Challenges and solutions , 2019, PloS one.

[6]  Robert Slonje,et al.  The nature of cyberbullying, and strategies for prevention , 2013, Comput. Hum. Behav..

[7]  Philip S. Yu,et al.  A Survey on Text Classification: From Traditional to Deep Learning , 2020, ACM Trans. Intell. Syst. Technol..

[8]  Collins Udanor,et al.  Combating the challenges of social media hate speech in a polarized society , 2019, Data Technol. Appl..

[9]  Ke Wang,et al.  Using Imbalanced Triangle Synthetic Data for Machine Learning Anomaly Detection , 2019, Computers, Materials & Continua.

[10]  Feng Luo,et al.  Towards Automatic Detection and Explanation of Hate Speech and Offensive Language , 2020, IWSPA@CODASPY.

[11]  Ahlam Alrehili,et al.  Automatic Hate Speech Detection on Social Media: A Brief Survey , 2019, 2019 IEEE/ACS 16th International Conference on Computer Systems and Applications (AICCSA).

[13]  Uthman Alturki,et al.  Task-Technology Fit and Technology Acceptance Model Application to Structure and Evaluate the Adoption of Social Media in Academia , 2020, IEEE Access.

[14]  Sérgio Nunes,et al.  A Survey on Automatic Detection of Hate Speech in Text , 2018, ACM Comput. Surv..

[15]  Björn Ross,et al.  Measuring the Reliability of Hate Speech Annotations: The Case of the European Refugee Crisis , 2016, ArXiv.

[16]  Caitlin Elizabeth Ring Hate speech in social media: An exploration of the problem and its proposed solutions , 2013 .

[17]  Ziqi Zhang,et al.  Hate Speech Detection: A Solved Problem? The Challenging Case of Long Tail on Twitter , 2018, Semantic Web.

[18]  Lawrence Muchemi,et al.  Hate Speech Detection in Code-switched Text Messages , 2019, 2019 3rd International Symposium on Multidisciplinary Studies and Innovative Technologies (ISMSIT).

[19]  Zhen Ling,et al.  Estimating the Number of Posts in Sina Weibo , 2019 .

[20]  Raneem Qaddoura,et al.  Intelligent detection of hate speech in Arabic social network: A machine learning approach , 2020, J. Inf. Sci..

[21]  Jure Leskovec,et al.  Antisocial Behavior in Online Discussion Communities , 2015, ICWSM.

[22]  Pedro Rangel Henriques,et al.  Hate Speech Classification in Social Media Using Emotional Analysis , 2018, 2018 7th Brazilian Conference on Intelligent Systems (BRACIS).

[23]  Kristiawan Nugroho,et al.  Improving Random Forest Method to Detect Hatespeech and Offensive Word , 2019, 2019 International Conference on Information and Communications Technology (ICOIACT).

[24]  Alexei Botchkarev,et al.  A New Typology Design of Performance Metrics to Measure Errors in Machine Learning Regression Algorithms , 2019, Interdisciplinary Journal of Information, Knowledge, and Management.

[25]  Jenq-Haur Wang,et al.  Vulnerable community identification using hate speech detection on social media , 2020, Inf. Process. Manag..

[26]  A. Al-Hassan,et al.  DETECTION OF HATE SPEECH IN SOCIAL NETWORKS: A SURVEY ON MULTILINGUAL CORPUS , 2019, Computer Science & Information Technology(CS & IT).

[27]  Udo Kruschwitz,et al.  Improving Hate Speech Detection with Deep Learning Ensembles , 2018, LREC.

[28]  Muhammad Sajjad,et al.  Hate Speech Detection using Fusion Approach , 2019, 2019 International Conference on Applied and Engineering Mathematics (ICAEM).

[29]  Kai Peng,et al.  SocInf: Membership Inference Attacks on Social Media Health Data With Machine Learning , 2019, IEEE Transactions on Computational Social Systems.

[30]  Daphney-Stavroula Zois,et al.  Cyberbullying Ends Here: Towards Robust Detection of Cyberbullying in Social Media , 2019, WWW.

[31]  Jong Seo Kim United Nations Strategy and Plan of Action on Hate Speech , 2021 .

[32]  Ryan Ong Offensive Language Analysis using Deep Learning Architecture , 2019, ArXiv.

[33]  Hamid R. Arabnia,et al.  Cyberbullying detection on twitter using Big Five and Dark Triad features , 2019, Personality and Individual Differences.

[34]  Donald E. Brown,et al.  Text Classification Algorithms: A Survey , 2019, Inf..

[35]  Vinay Singh,et al.  A Dataset of Hindi-English Code-Mixed Social Media Text for Hate Speech Detection , 2018, PEOPLES@NAACL-HTL.

[36]  Robin M. Kowalski,et al.  Bullying in the digital age: a critical review and meta-analysis of cyberbullying research among youth. , 2014, Psychological bulletin.

[37]  Rajkumar Saini,et al.  Challenges of Hate Speech Detection in Social Media , 2021, SN Computer Science.

[38]  Shivakant Mishra,et al.  International Conference on Advances in Social Networks Analysis and Mining ( ASONAM ) Are They Our Brothers ? Analysis and Detection of Religious Hate Speech in the Arabic Twittersphere , 2018 .

[39]  Melody Moh,et al.  No "Love" Lost: Defending Hate Speech Detection Models Against Adversaries , 2020, 2020 14th International Conference on Ubiquitous Information Management and Communication (IMCOM).

[40]  Kristo Radion Purba,et al.  A Study on the Methods to Identify and Classify Cyberbullying in Social Media , 2018, 2018 Fourth International Conference on Advances in Computing, Communication & Automation (ICACCA).

[41]  Charu C. Aggarwal,et al.  A Survey of Text Classification Algorithms , 2012, Mining Text Data.

[42]  V. Raleigh Trends in world population: how will the millenium compare with the past? , 1999, Human reproduction update.

[43]  Ingmar Weber,et al.  Automated Hate Speech Detection and the Problem of Offensive Language , 2017, ICWSM.

[44]  Pete Burnap,et al.  Us and them: identifying cyber hate on Twitter across multiple protected characteristics , 2016, EPJ Data Science.

[45]  Federico Liberatore,et al.  Detecting and Monitoring Hate Speech in Twitter , 2019, Sensors.

[46]  Svetlana Bodrunova,et al.  Constructive Aggression? Multiple Roles of Aggressive Content in Political Discourse on Russian YouTube , 2021 .

[47]  Taha Yasseri,et al.  Detecting weak and strong Islamophobic hate speech on social media , 2018, Journal of Information Technology & Politics.

[48]  K. Nicholas,et al.  From population to production: 50 years of scientific literature on how to feed the world , 2020 .

[49]  J. Gulla,et al.  Fake news detection : Network data from social media used to predict fakes , 2017 .

[50]  Tomoaki Ohtsuki,et al.  Hate Speech on Twitter: A Pragmatic Approach to Collect Hateful and Offensive Expressions and Perform Hate Speech Detection , 2018, IEEE Access.

[51]  Dirk Hovy,et al.  Hateful Symbols or Hateful People? Predictive Features for Hate Speech Detection on Twitter , 2016, NAACL.

[52]  Ali Idri,et al.  Reviewing ensemble classification methods in breast cancer , 2019, Comput. Methods Programs Biomed..

[53]  George Weir,et al.  Cloud-based Textual Analysis as a Basis for Document Classification , 2018, 2018 International Conference on High Performance Computing & Simulation (HPCS).

[54]  Robert J. Bennett,et al.  Machine learning classification of entrepreneurs in British historical census data , 2020, Inf. Process. Manag..

[55]  Aurélien Géron,et al.  Hands-On Machine Learning with Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems , 2017 .

[56]  Savvas Papagiannidis,et al.  The effect of twitter dissemination on cost of equity: A big data approach , 2020, Int. J. Inf. Manag..

[57]  Kriti Saroha,et al.  Study of dimension reduction methodologies in data mining , 2015, International Conference on Computing, Communication & Automation.

[58]  Yangqiu Song,et al.  Multilingual and Multi-Aspect Hate Speech Analysis , 2019, EMNLP.

[59]  Tarek M. Mahmoud,et al.  Comparative Performance of Machine Learning and Deep Learning Algorithms for Arabic Hate Speech Detection in OSNs , 2020, AICV.

[60]  Barbara Poblete,et al.  Hate Speech Detection is Not as Easy as You May Think: A Closer Look at Model Validation , 2019, SIGIR.

[61]  Qingzhong Liu,et al.  Determine the Critical dimension in data mining (experiments with bioinformatics datasets) , 2011, 2011 11th International Conference on Intelligent Systems Design and Applications.

[62]  Viviana Patti,et al.  Resources and benchmark corpora for hate speech detection: a systematic review , 2020, Language Resources and Evaluation.

[63]  José van Dijck,et al.  Governing digital societies: Private platforms, public values , 2020, Comput. Law Secur. Rev..

[64]  Yan-Qing Zhang,et al.  Diversified ensemble classifiers for highly imbalanced data learning and its application in bioinformatics , 2011 .

[65]  Michael Wiegand,et al.  A Survey on Hate Speech Detection using Natural Language Processing , 2017, SocialNLP@EACL.

[66]  Loris Nanni,et al.  Texture descriptors and voxels for the early diagnosis of Alzheimer's disease , 2019, Artif. Intell. Medicine.

[67]  Sven Joeckel,et al.  Reporting Hate Comments: Investigating the Effects of Deviance Characteristics, Neutralization Strategies, and Users’ Moral Orientation , 2020, Communication Research.

[68]  Abdullah Gani,et al.  Predicting Cyberbullying on Social Media in the Big Data Era Using Machine Learning Algorithms: Review of Literature and Open Challenges , 2019, IEEE Access.

[69]  Yuhong Zhang,et al.  Sentiment Classification based on Piecewise Pooling Convolutional Neural Network , 2018 .

[70]  M. Diez-Mediavilla,et al.  Benchmarking of meteorological indices for sky cloudiness classification , 2020, Solar Energy.

[71]  Oluwafemi Oriola,et al.  Evaluating Machine Learning Techniques for Detecting Offensive and Hate Speech in South African Tweets , 2020, IEEE Access.

[72]  Yi-Ling Chen,et al.  Automatic Detection of Hate Speech on Facebook Using Sentiment and Emotion Analysis , 2019, 2019 International Conference on Artificial Intelligence in Information and Communication (ICAIIC).

[73]  Kasturi Dewi Varathan,et al.  Cybercrime detection in online communications: The experimental case of cyberbullying detection in the Twitter network , 2016, Comput. Hum. Behav..

[74]  Hossam Faris,et al.  Hate Speech Detection using Word Embedding and Deep Learning in the Arabic Language Context , 2020, International Conference on Pattern Recognition Applications and Methods.

[75]  Vasudeva Varma,et al.  Deep Learning for Hate Speech Detection in Tweets , 2017, WWW.

[76]  Hend Suliman Al-Khalifa,et al.  A Deep Learning Approach for Automatic Hate Speech Detection in the Saudi Twittersphere , 2020, Applied Sciences.

[77]  Asif Ekbal,et al.  A deep neural network based multi-task learning approach to hate speech detection , 2020, Knowl. Based Syst..

[78]  Jenq-Haur Wang,et al.  Social Network Hate Speech Detection for Amharic Language , 2018 .

[79]  Abeer Alsadoon,et al.  Deep Learning for Aspect-Based Sentiment Analysis: A Comparative Review , 2019, Expert Syst. Appl..