Abusive Text Examination Using Latent Dirichlet Allocation, Self Organizing Maps and K Means Clustering

The widespread misuse of social media to disseminate hate speech targeted at a particular individual, community, religion, race, sex or caste has engendered researchers from across the world to formulate strategies and methodologies to counter this menace. Without detection and analysis of hate speech, one cannot imagine the social media to be free of malicious content. This paper proposes a methodology which employs a combination of popular topic modeling technique i.e. Latent Dirichlet Allocation (LD A) and an unsupervised machine learning technique i.e. selforganizing maps (SOM) to analyze hate speech spread over social media. This method is compared to K means clustering used after the application of LDA. Both the techniques used together provide a powerful analysis. The proposed LDA model outputted ten topics for features and had a low perplexity with a higher negative log-likelihood score.

[1]  Ika Alfina,et al.  Hate Speech Detection on Indonesian Instagram Comments using FastText Approach , 2018, 2018 International Conference on Advanced Computer Science and Information Systems (ICACSIS).

[2]  Arif Budiarto,et al.  Twitter Dataset for Hate Speech and Cyberbullying Detection in Indonesian Language , 2019, 2019 International Conference on Information Management and Technology (ICIMTech).

[3]  Walter Daelemans,et al.  A Dictionary-based Approach to Racism Detection in Dutch Social Media , 2016, ArXiv.

[4]  Byoung-Jun Park,et al.  Emotion classification based on physiological signals induced by negative emotions: Discriminantion of negative emotions by machine learning algorithm , 2012, Proceedings of 2012 9th IEEE International Conference on Networking, Sensing and Control.

[5]  Puja Chakraborty,et al.  Threat and Abusive Language Detection on Social Media in Bengali Language , 2019, 2019 1st International Conference on Advances in Science, Engineering and Robotics Technology (ICASERT).

[6]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[7]  Animesh Mukherjee,et al.  Spread of Hate Speech in Online Social Media , 2018, WebSci.

[8]  Gilbert L. Peterson,et al.  Document Clustering and Visualization with Latent Dirichlet Allocation and Self-Organizing Maps , 2009, FLAIRS.

[9]  Pedro Rangel Henriques,et al.  Hate Speech Classification in Social Media Using Emotional Analysis , 2018, 2018 7th Brazilian Conference on Intelligent Systems (BRACIS).

[10]  Rahime Belen Saglam,et al.  Automated Detection of Hate Speech towards Woman on Twitter , 2018, 2018 3rd International Conference on Computer Science and Engineering (UBMK).

[11]  A.R. Weerasinghe,et al.  Identification of Hate Speech in Social Media , 2018, 2018 18th International Conference on Advances in ICT for Emerging Regions (ICTer).

[12]  Yi-Ling Chen,et al.  Automatic Detection of Hate Speech on Facebook Using Sentiment and Emotion Analysis , 2019, 2019 International Conference on Artificial Intelligence in Information and Communication (ICAIIC).

[13]  Zeerak Waseem,et al.  Are You a Racist or Am I Seeing Things? Annotator Influence on Hate Speech Detection on Twitter , 2016, NLP+CSS@EMNLP.