Toxic, Hateful, Offensive or Abusive? What Are We Really Classifying? An Empirical Analysis of Hate Speech Datasets

The field of the automatic detection of hate speech and related concepts has raised a lot of interest in the last years. Different datasets were annotated and classified by means of applying different machine learning algorithms. However, few efforts were done in order to clarify the applied categories and homogenize different datasets. Our study takes up this demand. We analyze six different publicly available datasets in this field with respect to their similarity and compatibility. We conduct two different experiments. First, we try to make the datasets compatible and represent the dataset classes as Fast Text word vectors analyzing the similarity between different classes in a intra and inter dataset manner. Second, we submit the chosen datasets to the Perspective API Toxicity classifier, achieving different performances depending on the categories and datasets. One of the main conclusions of these experiments is that many different definitions are being used for equivalent concepts, which makes most of the publicly available datasets incompatible. Grounded in our analysis, we provide guidelines for future dataset collection and annotation.

[1]  Karl Pearson F.R.S. LIII. On lines and planes of closest fit to systems of points in space , 1901 .

[2]  Preslav Nakov,et al.  Predicting the Type and Target of Offensive Posts in Social Media , 2019, NAACL.

[3]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[4]  Björn Gambäck,et al.  Studying Generalisability across Abusive Language Detection Datasets , 2019, CoNLL.

[5]  Preslav Nakov,et al.  SemEval-2019 Task 6: Identifying and Categorizing Offensive Language in Social Media (OffensEval) , 2019, *SEMEVAL.

[6]  Barbara Poblete,et al.  Hate Speech Detection is Not as Easy as You May Think: A Closer Look at Model Validation , 2019, SIGIR.

[7]  Alfredo Milani,et al.  Detecting Hate Speech for Italian Language in Social Media , 2018, EVALITA@CLiC-it.

[8]  Sérgio Nunes,et al.  Stop PropagHate at SemEval-2019 Tasks 5 and 6: Are abusive language classification results reproducible? , 2019, *SEMEVAL.

[9]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[10]  Paolo Rosso,et al.  SemEval-2019 Task 5: Multilingual Detection of Hate Speech Against Immigrants and Women in Twitter , 2019, *SEMEVAL.

[11]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[12]  Paolo Rosso,et al.  Automatic Identification and Classification of Misogynistic Language on Twitter , 2018, NLDB.

[13]  Sérgio Nunes,et al.  A Hierarchically-Labeled Portuguese Hate Speech Dataset , 2019, Proceedings of the Third Workshop on Abusive Language Online.

[14]  Dirk Hovy,et al.  Hateful Symbols or Hateful People? Predictive Features for Hate Speech Detection on Twitter , 2016, NAACL.

[15]  Paolo Rosso,et al.  Overview of the Evalita 2018 Task on Automatic Misogyny Identification (AMI) , 2018, EVALITA@CLiC-it.

[16]  Ingmar Weber,et al.  Automated Hate Speech Detection and the Problem of Offensive Language , 2017, ICWSM.

[17]  Ritesh Kumar,et al.  Benchmarking Aggression Identification in Social Media , 2018, TRAC@COLING 2018.

[18]  Tomas Mikolov,et al.  Advances in Pre-Training Distributed Word Representations , 2017, LREC.