Neural Network Hate Deletion: Developing a Machine Learning Model to Eliminate Hate from Online Comments

We propose a method for modifying hateful online comments to non-hateful comments without losing the understandability and original meaning of the comments. To accomplish this, we retrieve and classify 301,153 hateful and 1,041,490 non-hateful comments from Facebook and YouTube channels of a large international media organization that is a target of considerable online hate. We supplement this dataset by 10,000 Reddit comments manually labeled for hatefulness. Using these two datasets, we train a neural network to distinguish linguistic patterns. The model we develop, Neural Network Hate Deletion (NNHD), computes how hateful the sentences of a social media comment are and if they are above a given threshold, it deletes them using a language dependency tree. We evaluate the results by comparing crowd workers’ perceptions of hatefulness and understandability before and after transformation and find that our method reduces hatefulness without resulting in a significant loss of understandability. In some cases, removing hateful elements improves understandability by reducing the linguistic complexity of the comment. In addition, we find that NNHD can satisfactorily retain the original meaning on average but is not perfect in this regard. In terms of practical implications, NNHD could be used in social media platforms to suggest more neutral use of language to agitated online users.

[1]  K. R. Priya Grounded Theory Methodology , 2013 .

[2]  Elizabeth F. Churchill,et al.  Profanity use in online communities , 2012, CHI.

[3]  Jan Hajic,et al.  UDPipe: Trainable Pipeline for Processing CoNLL-U Files Performing Tokenization, Morphological Analysis, POS Tagging and Parsing , 2016, LREC.

[4]  Reza Zafarani,et al.  Sarcasm Detection on Twitter: A Behavioral Modeling Approach , 2015, WSDM.

[5]  Elizabeth F. Churchill,et al.  Automatic identification of personal insults on social news sites , 2012, J. Assoc. Inf. Sci. Technol..

[6]  Eelco Bakker,et al.  The lean startup , 2015 .

[7]  Martin Schader,et al.  Managing the Crowd: Towards a Taxonomy of Crowdsourcing Processes , 2011, AMCIS.

[8]  Larry Scheuermann,et al.  Netiquette , 2018, Internet Res..

[9]  Michael Strube,et al.  Dependency Tree Based Sentence Compression , 2008, INLG.

[10]  D. Rennie Grounded Theory Methodology , 1998 .

[11]  Jing Zhou,et al.  Hate Speech Detection with Comment Embeddings , 2015, WWW.

[12]  Catherine C. Marshall,et al.  Debugging a Crowdsourced Task with Low Inter-Rater Agreement , 2015, JCDL.

[13]  Fabrício Benevenuto,et al.  Analyzing the Targets of Hate in Online Social Media , 2016, ICWSM.

[14]  Pascale Fung,et al.  One-step and Two-step Classification for Abusive Language Detection on Twitter , 2017, ALW@ACL.

[15]  Guido Caldarelli,et al.  Echo Chambers: Emotional Contagion and Group Polarization on Facebook , 2016, Scientific Reports.

[16]  Ingmar Weber,et al.  Automated Hate Speech Detection and the Problem of Offensive Language , 2017, ICWSM.

[17]  Fabrício Benevenuto,et al.  A Measurement Study of Hate Speech in Social Media , 2017, HT.

[18]  S. Bamberg Changing environmentally harmful behaviors: A stage model of self-regulated behavioral change , 2013 .

[19]  Apala Guha,et al.  The Impact of Toxic Language on the Health of Reddit Communities , 2017, Canadian Conference on AI.

[20]  G. Norman Likert scales, levels of measurement and the “laws” of statistics , 2010, Advances in health sciences education : theory and practice.

[21]  Bernard J. Jansen,et al.  Anatomy of Online Hate: Developing a Taxonomy and Machine Learning Models for Identifying and Classifying Hate in Online News Media , 2018, ICWSM.

[22]  Pete Burnap,et al.  Us and them: identifying cyber hate on Twitter across multiple protected characteristics , 2016, EPJ Data Science.

[23]  Joel R. Tetreault,et al.  Abusive Language Detection in Online User Content , 2016, WWW.

[24]  Vasudeva Varma,et al.  Deep Learning for Hate Speech Detection in Tweets , 2017, WWW.

[25]  Derek Ruths,et al.  A Web of Hate: Tackling Hateful Speech in Online Social Spaces , 2017, ArXiv.

[26]  M. Norusis IBM SPSS Statistics 19 Statistical Procedures Companion , 2011 .

[27]  Jeffrey T. Hancock,et al.  Experimental evidence of massive-scale emotional contagion through social networks , 2014, Proceedings of the National Academy of Sciences.

[28]  Derek Ruths,et al.  Vectors for Counterspeech on Twitter , 2017, ALW@ACL.

[29]  Rasim M. Alguliyev,et al.  Evolutionary Algorithm for Extractive Text Summarization , 2009, Intell. Inf. Manag..