Are Your Friends Also Haters? Identification of Hater Networks on Social Media: Data Paper

Hate speech on social media platforms has become a severe issue in recent years. To cope with it, researchers have developed machine learning-based classification models. Due to the complexity of the problem, the models are far from perfect. A promising approach to improve them is to integrate social network data as additional features in the classification. Unfortunately, there is a lack of datasets containing text and social network data to investigate this phenomenon. Therefore, we develop an approach to identify and collect hater networks on Twitter that uses a pre-trained classification model to focus on hateful content. The contributions of this article are (1) an approach to identify hater networks and (2) an anonymized German offensive language dataset that comprises social network data. The dataset consists of 4,647,200 labeled tweets and a social graph with 49,353 users and 122,053 edges.

[1]  Michael Wiegand,et al.  Detection of Abusive Language: the Problem of Biased Datasets , 2019, NAACL.

[2]  Dirk Hovy,et al.  Hateful Symbols or Hateful People? Predictive Features for Hate Speech Detection on Twitter , 2016, NAACL.

[3]  Helen Yannakoudakis,et al.  Author Profiling for Abuse Detection , 2018, COLING.

[4]  Michael Wiegand,et al.  Overview of the GermEval 2018 Shared Task on the Identification of Offensive Language , 2018 .

[5]  Björn Gambäck,et al.  The Effects of User Features on Twitter Hate Speech Detection , 2018, ALW.

[6]  Ulrik Brandes,et al.  Studying Social Networks - A Guide to Empirical Research , 2013 .

[7]  Ingmar Weber,et al.  Automated Hate Speech Detection and the Problem of Offensive Language , 2017, ICWSM.

[8]  M. Williams,et al.  Hate in the Machine: Anti-Black and Anti-Muslim Social Media Posts as Predictors of Offline Racially and Religiously Aggravated Crime , 2019, The British Journal of Criminology.

[9]  Michael Wiegand,et al.  A Survey on Hate Speech Detection using Natural Language Processing , 2017, SocialNLP@EACL.

[10]  Athena Vakali,et al.  A Unified Deep Learning Architecture for Abuse Detection , 2018, WebSci.

[11]  Maeve Duggan,et al.  Online Harassment 2017 , 2017 .

[12]  Gianluca Stringhini,et al.  Mean Birds: Detecting Aggression and Bullying on Twitter , 2017, WebSci.

[13]  Michael Wiegand,et al.  Overview of GermEval Task 2, 2019 Shared Task on the Identification of Offensive Language , 2019, KONVENS.

[14]  M. Williams,et al.  Corrigendum to: Hate in the Machine: Anti-Black and Anti-Muslim Social Media Posts as Predictors of Offline Racially and Religiously Aggravated Crime , 2019, The British Journal of Criminology.

[15]  Lucas Dixon,et al.  Ex Machina: Personal Attacks Seen at Scale , 2016, WWW.

[16]  Klaus Krippendorff,et al.  Content Analysis: An Introduction to Its Methodology , 1980 .