Developing an online hate classifier for multiple social media platforms

The proliferation of social media enables people to express their opinions widely online. However, at the same time, this has resulted in the emergence of conflict and hate, making online environments uninviting for users. Although researchers have found that hate is a problem across multiple platforms, there is a lack of models for online hate detection using multi-platform data. To address this research gap, we collect a total of 197,566 comments from four platforms: YouTube, Reddit, Wikipedia, and Twitter, with 80% of the comments labeled as non-hateful and the remaining 20% labeled as hateful. We then experiment with several classification algorithms (Logistic Regression, Naïve Bayes, Support Vector Machines, XGBoost, and Neural Networks) and feature representations (Bag-of-Words, TF-IDF, Word2Vec, BERT, and their combination). While all the models significantly outperform the keyword-based baseline classifier, XGBoost using all features performs the best (F1 = 0.92). Feature importance analysis indicates that BERT features are the most impactful for the predictions. Findings support the generalizability of the best model, as the platform-specific results from Twitter and Wikipedia are comparable to their respective source papers. We make our code publicly available for application in real software systems as well as for further development by online hate researchers.

[1]  Elizabeth F. Churchill,et al.  Automatic identification of personal insults on social news sites , 2012, J. Assoc. Inf. Sci. Technol..

[2]  Alexandru Iosup,et al.  Toxicity detection in multiplayer online games , 2015, 2015 International Workshop on Network and Systems Support for Games (NetGames).

[3]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[4]  Jun-Ming Xu,et al.  Learning from Bullying Traces in Social Media , 2012, NAACL.

[5]  Jan Snajder,et al.  Cross-Domain Detection of Abusive Language Online , 2018, ALW.

[6]  H. Young Monotonic solutions of cooperative games , 1985 .

[7]  Ingmar Weber,et al.  Automated Hate Speech Detection and the Problem of Offensive Language , 2017, ICWSM.

[8]  Ashish Sureka,et al.  A focused crawler for mining hate and extremism promoting videos on YouTube. , 2014, HT.

[9]  Jian Zhu,et al.  UM-IU@LING at SemEval-2019 Task 6: Identifying Offensive Tweets Using BERT and SVMs , 2019, *SEMEVAL.

[10]  Björn Ross,et al.  Measuring the Reliability of Hate Speech Annotations: The Case of the European Refugee Crisis , 2016, ArXiv.

[11]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[12]  Regan L. Mandryk,et al.  The Effects of Social Exclusion on Play Experience and Hostile Cognitions in Digital Games , 2016, CHI.

[13]  Gianluca Stringhini,et al.  Mean Birds: Detecting Aggression and Bullying on Twitter , 2017, WebSci.

[14]  Michelle F. Wright,et al.  Understanding the overlap between cyberbullying and cyberhate perpetration: Moderating effects of toxic online disinhibition. , 2019, Criminal behaviour and mental health : CBMH.

[15]  Samuel Walker,et al.  Hate Speech: The History of an American Controversy , 1994 .

[16]  Isar Nejadgholi,et al.  A Review of Standard Text Classification Practices for Multi-label Toxicity Identification of Online Content , 2018, ALW.

[17]  Phyllis B. Gerstenfeld,et al.  Hate Online: A Content Analysis of Extremist Internet Sites , 2003 .

[18]  Carolyn Penstein Rosé,et al.  Detecting offensive tweets via topical feature discovery over a large scale twitter corpus , 2012, CIKM.

[19]  Huixin Tian,et al.  “I'm in the center of the vortex”: The affective chain of social media trolling , 2019, Proceedings of the Association for Information Science and Technology.

[20]  Pascale Fung,et al.  One-step and Two-step Classification for Abusive Language Detection on Twitter , 2017, ALW@ACL.

[21]  Jing Qian,et al.  A Benchmark Dataset for Learning to Intervene in Online Hate Speech , 2019, EMNLP.

[22]  Ingmar Weber,et al.  Understanding Abuse: A Typology of Abusive Language Detection Subtasks , 2017, ALW@ACL.

[23]  Mai ElSherief,et al.  Hate Lingo: A Target-based Linguistic Analysis of Hate Speech in Social Media , 2018, ICWSM.

[24]  Gianluca Stringhini,et al.  Measuring #GamerGate: A Tale of Hate, Sexism, and Bullying , 2017, WWW.

[25]  Apala Guha,et al.  The Impact of Toxic Language on the Health of Reddit Communities , 2017, Canadian Conference on AI.

[26]  Reza Zafarani,et al.  Sarcasm Detection on Twitter: A Behavioral Modeling Approach , 2015, WSDM.

[27]  Dirk Hovy,et al.  Hateful Symbols or Hateful People? Predictive Features for Hate Speech Detection on Twitter , 2016, NAACL.

[28]  Jenq-Haur Wang,et al.  Vulnerable community identification using hate speech detection on social media , 2020, Inf. Process. Manag..

[29]  Lucas Dixon,et al.  Ex Machina: Personal Attacks Seen at Scale , 2016, WWW.

[30]  Yuzhou Wang,et al.  Locate the Hate: Detecting Tweets against Blacks , 2013, AAAI.

[31]  Ona de Gibert,et al.  Hate Speech Dataset from a White Supremacy Forum , 2018, ALW.

[32]  Lisa Kaati,et al.  Levels of Hate in Online Environments , 2019, 2019 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM).

[33]  Guido Caldarelli,et al.  Echo Chambers: Emotional Contagion and Group Polarization on Facebook , 2016, Scientific Reports.

[34]  Lei Gao,et al.  Detecting Online Hate Speech Using Context Aware Models , 2017, RANLP.

[35]  Yo-Sub Han,et al.  An abusive text detection system based on enhanced abusive and non-abusive word lists , 2018, Decis. Support Syst..

[36]  Fabrício Benevenuto,et al.  A Measurement Study of Hate Speech in Social Media , 2017, HT.

[37]  K. Hazel Kwon,et al.  Is offensive commenting contagious online? Examining public vs interpersonal swearing in response to Donald Trump's YouTube campaign videos , 2017, Internet Res..

[38]  Michael Castelle,et al.  The Linguistic Ideologies of Deep Abusive Language Classification , 2018, ALW.

[39]  Animesh Mukherjee,et al.  Thou shalt not hate: Countering Online Hate Speech , 2018, ICWSM.

[40]  Yuanzhi Li,et al.  Convergence Analysis of Two-layer Neural Networks with ReLU Activation , 2017, NIPS.

[41]  Elissa Lee,et al.  Persuasive Storytelling by Hate Groups Online Examining Its Effects on Adolescents , 2001 .

[42]  Mai ElSherief,et al.  Leveraging Intra-User and Inter-User Representation Learning for Automated Hate Speech Detection , 2018, NAACL.

[43]  Bernard J. Jansen,et al.  Online Hate Ratings Vary by Extremes: A Statistical Analysis , 2019, CHIIR.

[44]  Sergio Rojas Galeano,et al.  Shielding Google's language toxicity model against adversarial attacks , 2018, ArXiv.

[45]  Yiming Yang,et al.  XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.

[46]  Tomoaki Ohtsuki,et al.  Hate Speech on Twitter: A Pragmatic Approach to Collect Hateful and Offensive Expressions and Perform Hate Speech Detection , 2018, IEEE Access.

[47]  Lora Aroyo,et al.  Crowdsourcing Subjective Tasks: The Case Study of Understanding Toxicity in Online Discussions , 2019, WWW.

[48]  Giovanni Vigna,et al.  Peer to Peer Hate: Hate Speech Instigators and Their Targets , 2018, ICWSM.

[49]  Derek Ruths,et al.  A Web of Hate: Tackling Hateful Speech in Online Social Spaces , 2017, ArXiv.

[50]  Dolf Trieschnigg,et al.  Improving Cyberbullying Detection with User Context , 2013, ECIR.

[51]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[52]  Vasudeva Varma,et al.  Deep Learning for Hate Speech Detection in Tweets , 2017, WWW.

[53]  Russell A. Sabella Cyberbullying and Cyberthreats: Responding to the Challenge of Online Social Aggression, Threats, and Distress , 2007 .

[54]  Henry Lieberman,et al.  Modeling the Detection of Textual Cyberbullying , 2011, The Social Mobile Web.

[55]  David Robinson,et al.  Detecting Hate Speech on Twitter Using a Convolution-GRU Based Deep Neural Network , 2018, ESWC.

[56]  Gianluca Stringhini,et al.  Hate is not Binary: Studying Abusive Behavior of #GamerGate on Twitter , 2017, HT.

[57]  Derek Ruths,et al.  Vectors for Counterspeech on Twitter , 2017, ALW@ACL.

[58]  Eric Gilbert,et al.  The Bag of Communities: Identifying Abusive Behavior Online with Preexisting Internet Data , 2017, CHI.

[59]  Shivakant Mishra,et al.  Analyzing Labeled Cyberbullying Incidents on the Instagram Social Network , 2015, SocInfo.

[60]  Sonam Adinolf,et al.  Toxic Behaviors in Esports Games: Player Perceptions and Coping Strategies , 2018, CHI PLAY.

[61]  Paula Cristina Teixeira Fortuna,et al.  Automatic detection of hate speech in text: an overview of the topic and dataset annotation with hierarchical classes , 2017 .

[62]  Andreas Holzinger,et al.  Interactive machine learning for health informatics: when do we need the human-in-the-loop? , 2016, Brain Informatics.

[63]  Siddique Latif,et al.  Exploring Media Bias and Toxicity in South Asian Political Discourse , 2018, 2018 12th International Conference on Open Source Systems and Technologies (ICOSST).

[64]  Yejin Choi,et al.  The Risk of Racial Bias in Hate Speech Detection , 2019, ACL.

[65]  Elizabeth F. Churchill,et al.  Profanity use in online communities , 2012, CHI.

[66]  Björn Gambäck,et al.  The Effects of User Features on Twitter Hate Speech Detection , 2018, ALW.

[67]  Laura Leets,et al.  Persuasive Storytelling by Hate Groups Online , 2002 .

[68]  Mary J Marret,et al.  Factors associated with online victimisation among Malaysian adolescents who use social networking sites: a cross-sectional study , 2017, BMJ Open.

[69]  Bernard J. Jansen,et al.  Neural Network Hate Deletion: Developing a Machine Learning Model to Eliminate Hate from Online Comments , 2018, INSCI.

[70]  Athena Vakali,et al.  A Unified Deep Learning Architecture for Abuse Detection , 2018, WebSci.

[71]  J. Pennebaker,et al.  The Psychological Meaning of Words: LIWC and Computerized Text Analysis Methods , 2010 .

[72]  M. Brewer The Psychology of Prejudice: Ingroup Love and Outgroup Hate? , 1999 .

[73]  Stan Matwin,et al.  Boosting Text Classification Performance on Sexist Tweets by Text Augmentation and Text Generation Using a Combination of Knowledge Graphs , 2018, ALW.

[74]  Mauro Conti,et al.  All You Need is "Love": Evading Hate Speech Detection , 2018, ArXiv.

[75]  Sérgio Nunes,et al.  A Survey on Automatic Detection of Hate Speech in Text , 2018, ACM Comput. Surv..

[76]  Hercules Dalianis,et al.  Applied Natural Language Processing: Identification, Investigation and Resolution , 2011 .

[77]  Heri Ramampiaro,et al.  Effective hate-speech detection in Twitter data using recurrent neural networks , 2018, Applied Intelligence.

[78]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[79]  Andrei-Bogdan Puiu,et al.  SemEval-2019 Task 6: Identifying and Categorizing Offensive Language in Social Media , 2019, ArXiv.

[80]  Zeerak Waseem,et al.  Are You a Racist or Am I Seeing Things? Annotator Influence on Hate Speech Detection on Twitter , 2016, NLP+CSS@EMNLP.

[81]  Bernard J. Jansen,et al.  Anatomy of Online Hate: Developing a Taxonomy and Machine Learning Models for Identifying and Classifying Hate in Online News Media , 2018, ICWSM.

[82]  Bernard J. Jansen,et al.  Detecting Toxicity Triggers in Online Discussions , 2019, HT.

[83]  Eleanor Mattern,et al.  From cyberbullying to well‐being: A narrative‐based participatory approach to values‐oriented design for social media , 2015, J. Assoc. Inf. Sci. Technol..

[84]  Scott H. Decker,et al.  Technology and conflict: Group processes and collective violence in the Internet era , 2016, Crime, Law and Social Change.

[85]  Alex Nikolov,et al.  Nikolov-Radivchev at SemEval-2019 Task 6: Offensive Tweet Classification with BERT and Ensembles , 2019, *SEMEVAL.

[86]  Nazli Goharian,et al.  Hate speech detection: Challenges and solutions , 2019, PloS one.

[87]  Fabrício Benevenuto,et al.  Analyzing the Targets of Hate in Online Social Media , 2016, ICWSM.

[88]  Animesh Mukherjee,et al.  Hateminers : Detecting Hate speech against Women , 2018, ArXiv.

[89]  Ingmar Weber,et al.  Racial Bias in Hate Speech and Abusive Language Detection Datasets , 2019, Proceedings of the Third Workshop on Abusive Language Online.

[90]  John Pavlopoulos,et al.  Deeper Attention to Abusive User Content Moderation , 2017, EMNLP.

[91]  Kathleen McKeown,et al.  Predictive Embeddings for Hate Speech Detection on Twitter , 2018, ALW.

[92]  Marcus Tomalin,et al.  Quarantining online hate speech: technical and ethical perspectives , 2019, Ethics and Information Technology.

[93]  Fredrik Olsson,et al.  Learning Representations for Detecting Abusive Language , 2018, ALW.

[94]  Jure Leskovec,et al.  Community Interaction and Conflict on the Web , 2018, WWW.

[95]  Muchazondida Mkono,et al.  ‘Troll alert!’: Provocation and harassment in tourism and hospitality social media , 2018 .

[96]  Bernard J. Jansen,et al.  Online Hate Interpretation Varies by Country, But More by Individual: A Statistical Analysis Using Crowdsourced Ratings , 2018, 2018 Fifth International Conference on Social Networks Analysis, Management and Security (SNAMS).

[97]  Narendra Shekokar,et al.  A Framework for Cyberbullying Detection in Social Network , 2015 .

[98]  Ying Chen,et al.  Detecting Offensive Language in Social Media to Protect Adolescent Online Safety , 2012, 2012 International Conference on Privacy, Security, Risk and Trust and 2012 International Confernece on Social Computing.

[99]  Torsten Zesch,et al.  LTL-UDE at SemEval-2019 Task 6: BERT and Two-Vote Classification for Categorizing Offensiveness , 2019, *SEMEVAL.

[100]  Andrew D. Murray,et al.  Information Technology Law: The Law and Society , 2010 .

[101]  Enes Yigitbas,et al.  HT '19: Proceedings of the 30th ACM Conference on Hypertext and Social Media, Hof, Germany — September 17 - 20, 2019 , 2019 .

[102]  Jeremy Blackburn,et al.  "You Know What to Do" , 2018, Proc. ACM Hum. Comput. Interact..

[103]  Jing Zhou,et al.  Hate Speech Detection with Comment Embeddings , 2015, WWW.

[104]  Pete Burnap,et al.  Us and them: identifying cyber hate on Twitter across multiple protected characteristics , 2016, EPJ Data Science.

[105]  Liang Zou,et al.  NULI at SemEval-2019 Task 6: Transfer Learning for Offensive Language Detection using Bidirectional Transformers , 2019, *SEMEVAL.

[106]  James Hawdon,et al.  Targets of Online Hate: Examining Determinants of Victimization Among Young Finnish Facebook Users , 2016, Violence and Victims.

[107]  Christopher D. Manning,et al.  Baselines and Bigrams: Simple, Good Sentiment and Topic Classification , 2012, ACL.

[108]  Joel R. Tetreault,et al.  Abusive Language Detection in Online User Content , 2016, WWW.

[109]  Helen Yannakoudakis,et al.  Neural Character-based Composition Models for Abuse Detection , 2018, ALW.

[110]  Susan C. Herring,et al.  Searching for Safety Online: Managing "Trolling" in a Feminist Forum , 2002, Inf. Soc..

[111]  Marco Michieli,et al.  Smart cities, social media platforms and security: online content regulation as a site of controversy and conflict , 2018, City, Territory and Architecture.