Countering hate on social media: Large scale classification of hate and counter speech

Hateful rhetoric is plaguing online discourse, fostering extreme societal movements and possibly giving rise to real-world violence. A potential solution to this growing global problem is citizen-generated counter speech where citizens actively engage in hate-filled conversations to attempt to restore civil non-polarized discourse. However, its actual effectiveness in curbing the spread of hatred is unknown and hard to quantify. One major obstacle to researching this question is a lack of large labeled data sets for training automated classifiers to identify counter speech. Here we made use of a unique situation in Germany where self-labeling groups engaged in organized online hate and counter speech. We used an ensemble learning algorithm which pairs a variety of paragraph embeddings with regularized logistic regression functions to classify both hate and counter speech in a corpus of millions of relevant tweets from these two groups. Our pipeline achieved macro F1 scores on out of sample balanced test sets ranging from 0.76 to 0.97---accuracy in line and even exceeding the state of the art. On thousands of tweets, we used crowdsourcing to verify that the judgments made by the classifier are in close alignment with human judgment. We then used the classifier to discover hate and counter speech in more than 135,000 fully-resolved Twitter conversations occurring from 2013 to 2018 and study their frequency and interaction. Altogether, our results highlight the potential of automated methods to evaluate the impact of coordinated counter speech in stabilizing conversations on social media.

[1]  Marc Ziegele,et al.  Journalistic counter-voices in comment sections: Patterns, determinants, and potential consequences of interactive moderation of uncivil user comments , 2018 .

[2]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[3]  Michael Wiegand,et al.  A Survey on Hate Speech Detection using Natural Language Processing , 2017, SocialNLP@EACL.

[4]  Timothy Baldwin,et al.  An Empirical Evaluation of doc2vec with Practical Insights into Document Embedding Generation , 2016, Rep4NLP@ACL.

[5]  Nazli Goharian,et al.  Hate speech detection: Challenges and solutions , 2019, PloS one.

[6]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[7]  Ona de Gibert,et al.  Hate Speech Dataset from a White Supremacy Forum , 2018, ALW.

[8]  Catherine Blaya Cyberhate: A review and content analysis of intervention strategies , 2019, Aggression and Violent Behavior.

[9]  Animesh Mukherjee,et al.  Analyzing the hate and counter speech accounts on Twitter , 2018, ArXiv.

[10]  Udo Kruschwitz,et al.  Improving Hate Speech Detection with Deep Learning Ensembles , 2018, LREC.

[11]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[12]  Derek Ruths,et al.  Vectors for Counterspeech on Twitter , 2017, ALW@ACL.

[13]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[14]  Alexei Bastidas,et al.  Technology Solutions to Combat Online Harassment , 2017, ALW@ACL.

[15]  Kathleen McKeown,et al.  Predictive Embeddings for Hate Speech Detection on Twitter , 2018, ALW.

[16]  Tom De Smedt,et al.  Right-wing German Hate Speech on Twitter: Analysis and Automatic Detection , 2019, ArXiv.

[17]  Pete Burnap,et al.  Us and them: identifying cyber hate on Twitter across multiple protected characteristics , 2016, EPJ Data Science.

[18]  Shervin Malmasi,et al.  Challenges in discriminating profanity from hate speech , 2017, J. Exp. Theor. Artif. Intell..

[19]  A. Al-Hassan,et al.  DETECTION OF HATE SPEECH IN SOCIAL NETWORKS: A SURVEY ON MULTILINGUAL CORPUS , 2019, Computer Science & Information Technology(CS & IT).

[20]  Virgílio A. F. Almeida,et al.  Characterizing and Detecting Hateful Users on Twitter , 2018, ICWSM.

[21]  B. Sweetman Between Facts and Norms: Contributions to a Discourse Theory of Law and Democracy , 1997 .

[22]  Sandeep Soni,et al.  Racism is a Virus: Anti-Asian Hate and Counterhate in Social Media during the COVID-19 Crisis , 2020, ArXiv.

[23]  Ingmar Weber,et al.  Automated Hate Speech Detection and the Problem of Offensive Language , 2017, ICWSM.

[24]  D. Farrington,et al.  Are cyberbullying intervention and prevention programs effective? A systematic and meta-analytical review , 2019, Aggression and Violent Behavior.

[25]  Lena Frischlich,et al.  Hate and counter-voices in the Internet: Introduction to the special isssue , 2018 .

[26]  Anne Weber,et al.  Manual on Hate Speech , 2009 .

[27]  Petr Sojka,et al.  Software Framework for Topic Modelling with Large Corpora , 2010 .

[28]  Ziqi Zhang,et al.  Hate Speech Detection: A Solved Problem? The Challenging Case of Long Tail on Twitter , 2018, Semantic Web.

[29]  Karsten Müller,et al.  Fanning the Flames of Hate: Social Media and Hate Crime , 2020, Journal of the European Economic Association.

[30]  Taha Yasseri,et al.  Detecting weak and strong Islamophobic hate speech on social media , 2018, Journal of Information Technology & Politics.

[31]  P. Räsänen,et al.  Perceived Societal Fear and Cyberhate after the November 2015 Paris Terrorist Attacks , 2020 .

[32]  Animesh Mukherjee,et al.  Thou shalt not hate: Countering Online Hate Speech , 2018, ICWSM.

[33]  Jean-Gabriel Young,et al.  Impact and dynamics of hate and counter speech online , 2020, ArXiv.

[34]  Heri Ramampiaro,et al.  Effective hate-speech detection in Twitter data using recurrent neural networks , 2018, Applied Intelligence.

[35]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[36]  Pekka Räsänen,et al.  Exposure to Online Hate in Four Nations: A Cross-National Consideration , 2017 .

[37]  Richard Khoury,et al.  Impact of Sentiment Detection to Recognize Toxic and Subversive Online Comments , 2018, ArXiv.

[38]  Adam Michael Edwards,et al.  Detecting tension in online communities with computational Twitter analysis , 2015 .

[39]  Paolo Rosso,et al.  SemEval-2019 Task 5: Multilingual Detection of Hate Speech Against Immigrants and Women in Twitter , 2019, *SEMEVAL.

[40]  Jacob Eisenstein,et al.  You Can't Stay Here , 2017 .

[41]  Fabian Winter,et al.  Normative Change and Culture of Hate: An Experiment in Online Environments , 2018 .