Directions in abusive language training data, a systematic review: Garbage in, garbage out

Data-driven and machine learning based approaches for detecting, categorising and measuring abusive content such as hate speech and harassment have gained traction due to their scalability, robustness and increasingly high performance. Making effective detection systems for abusive content relies on having the right training datasets, reflecting a widely accepted mantra in computer science: Garbage In, Garbage Out. However, creating training datasets which are large, varied, theoretically-informed and that minimize biases is difficult, laborious and requires deep expertise. This paper systematically reviews 63 publicly available training datasets which have been created to train abusive language classifiers. It also reports on creation of a dedicated website for cataloguing abusive language data hatespeechdata.com. We discuss the challenges and opportunities of open science in this field, and argue that although more dataset sharing would bring many benefits it also poses social and ethical risks which need careful consideration. Finally, we provide evidence-based recommendations for practitioners creating new abusive content training datasets.

[1]  I. Shapiro Problems, Methods, and Theories in the Study of Politics, or What's Wrong with Political Science and What to Do About it , 2002 .

[2]  G. Smith,et al.  Bias in meta-analysis detected by a simple, graphical test , 1997, BMJ.

[3]  Gianluca Stringhini,et al.  Measuring #GamerGate: A Tale of Hate, Sexism, and Bullying , 2017, WWW.

[4]  Ona de Gibert,et al.  Hate Speech Dataset from a White Supremacy Forum , 2018, ALW.

[5]  Michael C. Frank,et al.  Data availability, reusability, and analytic reproducibility: evaluating the impact of a mandatory open data policy at the journal Cognition , 2018, Royal Society Open Science.

[6]  Preslav Nakov,et al.  Predicting the Type and Target of Offensive Posts in Social Media , 2019, NAACL.

[7]  L. Lachenicht Aggravating language a study of abusive and insulting language , 1980 .

[8]  Radhika Mamidi,et al.  When does a compliment become sexist? Analysis and classification of ambivalent sexism using twitter data , 2017, NLP+CSS@ACL.

[9]  Kalina Bontcheva,et al.  Broad Twitter Corpus: A Diverse Named Entity Recognition Resource , 2016, COLING.

[10]  Animesh Mukherjee,et al.  Spread of Hate Speech in Online Social Media , 2018, WebSci.

[11]  Siân Brooke,et al.  "There are no girls on the Internet": Gender performances in Advice Animal memes , 2019, First Monday.

[12]  Marco Guerini,et al.  CONAN - COunter NArratives through Nichesourcing: a Multilingual Dataset of Responses to Fight Online Hate Speech , 2019, ACL.

[13]  Gabriela Ferraro,et al.  Transfer learning for hate speech detection in social media , 2019, Journal of Computational Social Science.

[14]  Gianluca Stringhini,et al.  Hate is not Binary: Studying Abusive Behavior of #GamerGate on Twitter , 2017, HT.

[15]  Jane Suiter,et al.  Post-truth Politics , 2016 .

[16]  Sarah Myers West,et al.  Censored, suspended, shadowbanned: User interpretations of content moderation on social media platforms , 2018, New Media Soc..

[17]  Ralf Peters,et al.  Detecting Offensive Statements towards Foreigners in Social Media , 2017, HICSS.

[18]  Kalina Bontcheva,et al.  Corpus Annotation through Crowdsourcing: Towards Best Practice Guidelines , 2014, LREC.

[19]  Raphaël Troncy,et al.  Analysis of named entity recognition and linking for tweets , 2014, Inf. Process. Manag..

[20]  Kalina Bontcheva,et al.  The GATE Crowdsourcing Plugin: Crowdsourcing Annotated Corpora Made Easy , 2014, EACL.

[21]  WaldoJim,et al.  Privacy, anonymity, and big data in the social sciences , 2014 .

[22]  Gianluca Stringhini,et al.  Large Scale Crowdsourcing and Characterization of Twitter Abusive Behavior , 2018, ICWSM.

[23]  Julia Hirschberg,et al.  Detecting Hate Speech on the World Wide Web , 2012 .

[24]  Anne-Wil Harzing,et al.  Google Scholar, Scopus and the Web of Science: a longitudinal and cross-disciplinary comparison , 2015, Scientometrics.

[25]  Alice E. Marwick,et al.  Online Harassment, Defamation, and Hateful Speech: A Primer of the Legal Landscape , 2014 .

[26]  Cristina Bosco,et al.  An Italian Twitter Corpus of Hate Speech against Immigrants , 2018, LREC.

[27]  Hugo Jair Escalante,et al.  Overview of MEX-A3T at IberLEF 2019: Authorship and Aggressiveness Analysis in Mexican Spanish Tweets , 2018, IberLEF@SEPLN.

[28]  Gianluca Stringhini,et al.  What is Gab: A Bastion of Free Speech or an Alt-Right Echo Chamber , 2018, WWW.

[29]  Cody Buntain,et al.  A Large Labeled Corpus for Online Harassment Research , 2017, WebSci.

[30]  Yangqiu Song,et al.  Multilingual and Multi-Aspect Hate Speech Analysis , 2019, EMNLP.

[31]  Timnit Gebru,et al.  Lessons from archives: strategies for collecting sociocultural data in machine learning , 2019, FAT*.

[32]  Zeerak Waseem,et al.  Are You a Racist or Am I Seeing Things? Annotator Influence on Hate Speech Detection on Twitter , 2016, NLP+CSS@EMNLP.

[33]  M. Taddeo Data philanthropy and the design of the infraethics for information societies , 2016, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[34]  Douwe Kiela,et al.  The Hateful Memes Challenge: Detecting Hate Speech in Multimodal Memes , 2020, NeurIPS.

[35]  Alexander van Deursen,et al.  The digital divide shifts to differences in usage , 2014, New Media Soc..

[36]  Ritesh Kumar,et al.  Aggression-annotated Corpus of Hindi-English Code-mixed Data , 2018, LREC.

[37]  Tarleton Gillespie,et al.  Content moderation, AI, and the question of scale , 2020, Big Data Soc..

[38]  David Reitter,et al.  Crowdsourcing the Measurement of Interstate Conflict , 2016, PloS one.

[39]  Gianluca Stringhini,et al.  Screenshot Classifier annotated images pHashes of non-screenshot annotated images Know Your Meme Generic Annotation Sites Meme Annotation Sites Generic Web Communities , 2018 .

[40]  J. Bohannon Human subject research. Social science for pennies. , 2011, Science.

[41]  Iryna Gurevych,et al.  Text Processing Like Humans Do: Visually Attacking and Shielding NLP Systems , 2019, NAACL.

[42]  Michael Wiegand,et al.  Detection of Abusive Language: the Problem of Biased Datasets , 2019, NAACL.

[43]  Sérgio Nunes,et al.  A Hierarchically-Labeled Portuguese Hate Speech Dataset , 2019, Proceedings of the Third Workshop on Abusive Language Online.

[44]  Ingmar Weber,et al.  Racial Bias in Hate Speech and Abusive Language Detection Datasets , 2019, Proceedings of the Third Workshop on Abusive Language Online.

[45]  Scott A. Hale,et al.  Challenges and frontiers in abusive content detection , 2019, Proceedings of the Third Workshop on Abusive Language Online.

[46]  James Goulding,et al.  Psychology of personal data donation , 2019, PloS one.

[47]  Michael Veale,et al.  Like Trainer, Like Bot? Inheritance of Bias in Algorithmic Content Moderation , 2017, SocInfo.

[48]  Michael Wiegand,et al.  A Survey on Hate Speech Detection using Natural Language Processing , 2017, SocialNLP@EACL.

[49]  B. Jansen,et al.  Mapping online hate: A scientometric analysis on research trends and hotspots in research on online hate , 2019, PloS one.

[50]  Sophie Ritson,et al.  ‘Crackpots’ and ‘active researchers’: The controversy over links between arXiv and the scientific blogosphere , 2016, Social studies of science.

[51]  Justin Reich,et al.  Privacy, anonymity, and big data in the social sciences , 2014, Commun. ACM.

[52]  Virgílio A. F. Almeida,et al.  Characterizing and Detecting Hateful Users on Twitter , 2018, ICWSM.

[53]  Sylvie Delacroix,et al.  Bottom-Up Data Trusts: Disturbing the ‘One Size Fits All’ Approach to Data Governance , 2018, International Data Privacy Law.

[54]  Lucas Dixon,et al.  Ex Machina: Personal Attacks Seen at Scale , 2016, WWW.

[55]  Bernard J. Jansen,et al.  Developing an online hate classifier for multiple social media platforms , 2020, Human-centric Computing and Information Sciences.

[56]  Ankur Taly,et al.  Counterfactual Fairness in Text Classification through Robustness , 2018, AIES.

[57]  James Pustejovsky,et al.  Natural Language Annotation for Machine Learning - a Guide to Corpus-Building for Applications , 2012 .

[58]  Dragomir R. Radev,et al.  The ACL Anthology Reference Corpus: A Reference Dataset for Bibliographic Research in Computational Linguistics , 2008, LREC.

[59]  David Jurgens,et al.  A Just and Comprehensive Strategy for Using NLP to Address Online Abuse , 2019, ACL.

[60]  Jing Qian,et al.  A Benchmark Dataset for Learning to Intervene in Online Hate Speech , 2019, EMNLP.

[61]  Wendy Hall,et al.  Growing the artificial intelligence industry in the UK , 2017 .

[62]  Nabiha Aziz Dog Whistles and Discriminatory Intent: Proving Intent Through Campaign Speech in Voting Rights Litigation , 2019 .

[63]  Leon Derczynski,et al.  Offensive Language and Hate Speech Detection for Danish , 2019, LREC.

[64]  Yejin Choi,et al.  The Risk of Racial Bias in Hate Speech Detection , 2019, ACL.

[65]  M. Williams,et al.  Cyber-hate on social media in the aftermath of Woolwich , 2015 .

[66]  Ingmar Weber,et al.  Automated Hate Speech Detection and the Problem of Offensive Language , 2017, ICWSM.

[67]  A. V. van Deursen,et al.  The digital divide shifts to differences in usage , 2014 .

[68]  Philip M. Davis,et al.  Does the arXiv lead to higher citations and reduced publisher downloads for mathematics articles? , 2006, Scientometrics.

[69]  Pete Burnap,et al.  Us and them: identifying cyber hate on Twitter across multiple protected characteristics , 2016, EPJ Data Science.

[70]  James Davis,et al.  Evaluating and improving the usability of Mechanical Turk for low-income workers in India , 2010, ACM DEV '10.

[71]  Lifeng Lin,et al.  Quantifying publication bias in meta‐analysis , 2018, Biometrics.

[72]  Alan Macfarlane,et al.  Social , 1994, Schizophrenia Research.

[73]  E. Edmonds The New ABCs of Research: Achieving Breakthrough Collaborations , 2017, Leonardo.

[74]  Emily M. Bender,et al.  Data Statements for Natural Language Processing: Toward Mitigating System Bias and Enabling Better Science , 2018, TACL.

[75]  Gianluca Stringhini,et al.  Kek, Cucks, and God Emperor Trump: A Measurement Study of 4chan's Politically Incorrect Forum and Its Effects on the Web , 2016, ICWSM.

[76]  Reuben Binns,et al.  Algorithmic content moderation: Technical and political challenges in the automation of platform governance , 2020, Big Data Soc..

[77]  Mehmet Fatih Çömlekçi Custodians of the Internet: Platforms, Content Moderation, and the Hidden Decisions that Shape Social Media , 2019 .

[78]  Nathan Schneider,et al.  Association for Computational Linguistics: Human Language Technologies , 2011 .

[79]  Benjamin E. Lauderdale,et al.  Crowd-sourced Text Analysis: Reproducible and Agile Production of Political Data , 2016, American Political Science Review.

[80]  Sara Tonelli,et al.  Creating a WhatsApp Dataset to Study Pre-teen Cyberbullying , 2018, ALW.

[81]  Bernard J. Jansen,et al.  Online Hate Interpretation Varies by Country, But More by Individual: A Statistical Analysis Using Crowdsourced Ratings , 2018, 2018 Fifth International Conference on Social Networks Analysis, Management and Security (SNAMS).

[82]  Grant Blank The Digital Divide Among Twitter Users and Its Implications for Social Research , 2017 .

[83]  Rahul Goel,et al.  Detecting Offensive Content in Open-domain Conversations using Two Stage Semi-supervision , 2018, ArXiv.

[84]  Reut Tsarfaty,et al.  Evaluating NLP Models via Contrast Sets , 2020, ArXiv.

[85]  Lei Gao,et al.  Detecting Online Hate Speech Using Context Aware Models , 2017, RANLP.

[86]  John P. A. Ioannidis,et al.  A manifesto for reproducible science , 2017, Nature Human Behaviour.

[87]  Yarin Gal,et al.  BatchBALD: Efficient and Diverse Batch Acquisition for Deep Bayesian Active Learning , 2019, NeurIPS.

[88]  Indra Budi,et al.  Multi-label Hate Speech and Abusive Language Detection in Indonesian Twitter , 2019, Proceedings of the Third Workshop on Abusive Language Online.

[89]  P. Glasziou,et al.  Bias in meta-analysis detected by a simple, graphical test. Graphical test is itself biased. , 1998, BMJ.

[90]  Ika Alfina,et al.  Hate speech detection in the Indonesian language: A dataset and preliminary study , 2017, 2017 International Conference on Advanced Computer Science and Information Systems (ICACSIS).

[91]  Shanmughapriya,et al.  JIGSAW MULTILINGUAL TOXIC COMMENT CLASSIFICATION , 2022 .

[92]  M. Williams,et al.  Towards an Ethical Framework for Publishing Twitter Data in Social Research: Taking into Account Users’ Views, Online Context and Algorithmic Estimation , 2017, Sociology.

[93]  Christo Wilson,et al.  Reasoning about Political Bias in Content Moderation , 2020, AAAI.

[94]  Taha Yasseri,et al.  A Biased Review of Biases in Twitter Studies on Political Collective Action , 2016, Front. Phys..

[95]  Paolo Rosso,et al.  Overview of the Task on Automatic Misogyny Identification at IberEval 2018 , 2018, IberEval@SEPLN.

[96]  Matthew K. O. Lee,et al.  Online social networks: Why do students use facebook? , 2011, Comput. Hum. Behav..

[97]  Daniel Matthew Cer,et al.  Language-agnostic BERT Sentence Embedding , 2020, ACL.

[98]  Justin Grimmer,et al.  Text as Data: The Promise and Pitfalls of Automatic Content Analysis Methods for Political Texts , 2013, Political Analysis.

[99]  Imran Awan,et al.  We fear for our lives : offline and online experiences of anti-Muslim hostility , 2015 .

[100]  N. L. Vuong,et al.  Quality of flow diagram in systematic review and/or meta-analysis , 2018, PloS one.

[101]  Diana Maynard,et al.  Who cares about Sarcastic Tweets? Investigating the Impact of Sarcasm on Sentiment Analysis. , 2014, LREC.

[102]  Fiorenzo Franceschini,et al.  Do Scopus and WoS correct “old” omitted citations? , 2016, Scientometrics.

[103]  Manish Shrivastava,et al.  Degree based Classification of Harmful Speech using Twitter Data , 2018, TRAC@COLING 2018.

[104]  Ingmar Weber,et al.  Understanding Abuse: A Typology of Abusive Language Detection Subtasks , 2017, ALW@ACL.

[105]  Andrew Kehoe,et al.  . A corpus linguistic approach to the identification of swearing in computer mediated communication , 2017 .

[106]  Nikola S. Nikolov,et al.  Dataset Construction for the Detection of Anti-Social Behaviour in Online Communication in Arabic , 2018, ACLING.

[107]  K. Bretonnel Cohen,et al.  Last Words: Amazon Mechanical Turk: Gold Mine or Coal Mine? , 2011, CL.

[108]  Adam Tauman Kalai,et al.  Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings , 2016, NIPS.

[109]  J. McGowan,et al.  PRISMA Extension for Scoping Reviews (PRISMA-ScR): Checklist and Explanation , 2018, Annals of Internal Medicine.

[110]  George Bravos,et al.  Online Appendix to : Understanding Human-Machine Networks : A Cross-Disciplinary Survey , 2017 .

[111]  Scott A. Hale,et al.  Political Turbulence: How Social Media Shape Collective Action , 2015 .

[112]  Preslav Nakov,et al.  SemEval-2019 Task 6: Identifying and Categorizing Offensive Language in Social Media (OffensEval) , 2019, *SEMEVAL.

[113]  Tomaž Erjavec,et al.  Datasets of Slovene and Croatian Moderated News Comments , 2018, ALW.

[114]  Jill P Mesirov,et al.  Accessible Reproducible Research , 2010, Science.

[115]  I. Shapiro Problems, Methods, and Theories in the Study of Politics, or What's Wrong with Political Science and What to Do About it , 2002 .

[116]  Stan Matwin,et al.  Boosting Text Classification Performance on Sexist Tweets by Text Augmentation and Text Generation Using a Combination of Knowledge Graphs , 2018, ALW.

[117]  Soon-Gyo Jung,et al.  Topic-driven toxicity: Exploring the relationship between online toxicity and news topics , 2020, PloS one.

[118]  Dirk Hovy,et al.  Hateful Symbols or Hateful People? Predictive Features for Hate Speech Detection on Twitter , 2016, NAACL.

[119]  Amit P. Sheth,et al.  A Quality Type-aware Annotated Corpus and Lexicon for Harassment Research , 2018, WebSci.

[120]  N. Strossen HATE: Why We Should Resist it With Free Speech, Not Censorship , 2018 .

[121]  Naganna Chetty,et al.  Hate speech review in the context of online social networks , 2018 .

[122]  Rogers Prates de Pelle,et al.  Offensive Comments in the Brazilian Web: a dataset and baseline results , 2017 .

[123]  A. Kenny Freewill and Responsibility (Routledge Revivals) , 2011 .

[124]  Joel R. Tetreault,et al.  Abusive Language Detection in Online User Content , 2016, WWW.

[125]  D. Moher,et al.  Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. , 2010, International journal of surgery.

[126]  Ralf Peters,et al.  Detecting Cyberbullying in Online Communities , 2016, ECIS.

[127]  Lluis Gomez,et al.  Exploring Hate Speech Detection in Multimodal Publications , 2019, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).

[128]  John Pavlopoulos,et al.  Deeper Attention to Abusive User Content Moderation , 2017, EMNLP.

[129]  Xiaochang Peng,et al.  Exploring Deep Multimodal Fusion of Text and Photo for Hate Speech Classification , 2019, Proceedings of the Third Workshop on Abusive Language Online.

[130]  Casey Fiesler,et al.  “Participant” Perceptions of Twitter Research Ethics , 2018 .

[131]  Sérgio Nunes,et al.  A Survey on Automatic Detection of Hate Speech in Text , 2018, ACM Comput. Surv..

[132]  Franz J. Király,et al.  Design choices for productive, secure, data-intensive research at scale in the cloud , 2019, ArXiv.

[133]  Jonathan Mellon,et al.  Twitter and Facebook are not representative of the general population: Political attitudes and demographics of British social media users , 2017 .

[134]  Björn Ross,et al.  Measuring the Reliability of Hate Speech Annotations: The Case of the European Refugee Crisis , 2016, ArXiv.

[135]  Giovanni Vigna,et al.  Peer to Peer Hate: Hate Speech Instigators and Their Targets , 2018, ICWSM.

[136]  Sarah Myers West Censored, suspended, shadowbanned: User interpretations of content moderation on social media platforms , 2018 .