Hate Speech Detection is Not as Easy as You May Think: A Closer Look at Model Validation

Hate speech is an important problem that is seriously affecting the dynamics and usefulness of online social communities. Large scale social platforms are currently investing important resources into automatically detecting and classifying hateful content, without much success. On the other hand, the results reported by state-of-the-art systems indicate that supervised approaches achieve almost perfect performance but only within specific datasets. In this work, we analyze this apparent contradiction between existing literature and actual applications. We study closely the experimental methodology used in prior work and their generalizability to other datasets. Our findings evidence methodological issues, as well as an important dataset bias. As a consequence, performance claims of the current state-of-the-art have become significantly overestimated. The problems that we have found are mostly related to data overfitting and sampling issues. We discuss the implications for current research and re-conduct experiments to give a more accurate picture of the current state-of-the art methods.

[1]  Pascale Fung,et al.  One-step and Two-step Classification for Abusive Language Detection on Twitter , 2017, ALW@ACL.

[2]  Julien Cornebise,et al.  A large-scale crowdsourced analysis of abuse against women journalists and politicians on Twitter , 2019, ArXiv.

[3]  David Robinson,et al.  Detecting Hate Speech on Twitter Using a Convolution-GRU Based Deep Neural Network , 2018, ESWC.

[4]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[5]  Georges Linarès,et al.  Graph-Based Features for Automatic Online Abuse Detection , 2017, SLSP.

[6]  Gianluca Stringhini,et al.  Mean Birds: Detecting Aggression and Bullying on Twitter , 2017, WebSci.

[7]  Kelly Reynolds,et al.  Using Machine Learning to Detect Cyberbullying , 2011, 2011 10th International Conference on Machine Learning and Applications and Workshops.

[8]  Amit Awekar,et al.  Deep Learning for Detecting Cyberbullying Across Multiple Social Media Platforms , 2018, ECIR.

[9]  Mike Zhang,et al.  Grunn2019 at SemEval-2019 Task 5: Shared Task on Multilingual Detection of Hate , 2019, SemEval@NAACL-HLT.

[10]  Vasudeva Varma,et al.  Deep Learning for Hate Speech Detection in Tweets , 2017, WWW.

[11]  Kai Eckert,et al.  Cyberbullying Detection in Social Networks Using Deep Learning Based Models; A Reproducibility Study , 2018, DaWaK.

[12]  Yoav Goldberg,et al.  A Primer on Neural Network Models for Natural Language Processing , 2015, J. Artif. Intell. Res..

[13]  Tomas Mikolov,et al.  Bag of Tricks for Efficient Text Classification , 2016, EACL.

[14]  Dirk Hovy,et al.  Hateful Symbols or Hateful People? Predictive Features for Hate Speech Detection on Twitter , 2016, NAACL.

[15]  Dolf Trieschnigg,et al.  Experts and Machines against Bullies: A Hybrid Approach to Detect Cyberbullies , 2014, Canadian Conference on AI.

[16]  Lucas Dixon,et al.  Ex Machina: Personal Attacks Seen at Scale , 2016, WWW.

[17]  Shivakant Mishra,et al.  Prediction of Cyberbullying Incidents on the Instagram Social Network , 2015, ArXiv.

[18]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[19]  Zeerak Waseem,et al.  Are You a Racist or Am I Seeing Things? Annotator Influence on Hate Speech Detection on Twitter , 2016, NLP+CSS@EMNLP.

[20]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[21]  Michael Wiegand,et al.  A Survey on Hate Speech Detection using Natural Language Processing , 2017, SocialNLP@EACL.

[22]  Björn Gambäck,et al.  Using Convolutional Neural Networks to Classify Hate-Speech , 2017, ALW@ACL.

[23]  D. Lazer,et al.  Fake news on Twitter during the 2016 U.S. presidential election , 2019, Science.

[24]  Ingmar Weber,et al.  Automated Hate Speech Detection and the Problem of Offensive Language , 2017, ICWSM.

[25]  Mauro Conti,et al.  All You Need is "Love": Evading Hate Speech Detection , 2018, ArXiv.

[26]  Vivek K. Singh,et al.  Toward Multimodal Cyberbullying Detection , 2017, CHI Extended Abstracts.

[27]  Sérgio Nunes,et al.  A Survey on Automatic Detection of Hate Speech in Text , 2018, ACM Comput. Surv..

[28]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[29]  Yulan He,et al.  Approaches to Automated Detection of Cyberbullying: A Survey , 2020, IEEE Transactions on Affective Computing.

[30]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.