Introducing CAD: the Contextual Abuse Dataset

Online abuse can inflict harm on users and communities, making online spaces unsafe and toxic. Progress in automatically detecting and classifying abusive content is often held back by the lack of high quality and detailed datasets.We introduce a new dataset of primarily English Reddit entries which addresses several limitations of prior work. It (1) contains six conceptually distinct primary categories as well as secondary categories, (2) has labels annotated in the context of the conversation thread, (3) contains rationales and (4) uses an expert-driven group-adjudication process for high quality annotations. We report several baseline models to benchmark the work of future researchers. The annotated dataset, annotation guidelines, models and code are freely available.

[1]  Mari J. Matsuda Words That Wound: Critical Race Theory, Assaultive Speech, And The First Amendment , 1993 .

[2]  Yulia Tsvetkov,et al.  A Framework for the Computational Linguistic Analysis of Dehumanization , 2020, Frontiers in Artificial Intelligence.

[3]  Emily Ahn,et al.  Finding Microaggressions in the Wild: A Case for Locating Elusive Phenomena in Social Media Posts , 2019, EMNLP.

[4]  Emily M. Bender,et al.  Data Statements for Natural Language Processing: Toward Mitigating System Bias and Enabling Better Science , 2018, TACL.

[5]  Hugo Hammer,et al.  Detecting Threats of Violence in Online Discussions Using Bigrams of Important Words , 2014, 2014 IEEE Joint Intelligence and Security Informatics Conference.

[6]  Michael Wiegand,et al.  Detection of Abusive Language: the Problem of Biased Datasets , 2019, NAACL.

[7]  Preslav Nakov,et al.  Predicting the Type and Target of Offensive Posts in Social Media , 2019, NAACL.

[8]  Animesh Mukherjee,et al.  Thou shalt not hate: Countering Online Hate Speech , 2018, ICWSM.

[9]  T. Postmes,et al.  Intergroup distinctiveness and differentiation: a meta-analytic integration. , 2004, Journal of personality and social psychology.

[10]  Ona de Gibert,et al.  Hate Speech Dataset from a White Supremacy Forum , 2018, ALW.

[11]  M. McHugh Interrater reliability: the kappa statistic , 2012, Biochemia medica.

[12]  Prasenjit Majumder,et al.  Overview of the HASOC track at FIRE 2019: Hate Speech and Offensive Content Identification in Indo-European Languages , 2019, FIRE.

[13]  Walid Magdy,et al.  Abusive Language Detection on Arabic Social Media , 2017, ALW@ACL.

[14]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[15]  Jeremy Blackburn,et al.  The Pushshift Reddit Dataset , 2020, ICWSM.

[16]  Cristina Bosco,et al.  An Impossible Dialogue! Nominal Utterances and Populist Rhetoric in an Italian Twitter Corpus of Hate Speech against Immigrants , 2018, LREC.

[17]  Kalina Bontcheva,et al.  Corpus Annotation through Crowdsourcing: Towards Best Practice Guidelines , 2014, LREC.

[18]  Helen L. Norton,et al.  Intermediaries and Hate Speech: Fostering Digital Citizenship for Our Information Age , 2011 .

[19]  Patrícia G. C. Rossini Toxic for Whom? Examining the Targets of Uncivil and Intolerant Discourse in Online Political Talk , 2018 .

[20]  Dirk Hovy,et al.  Hateful Symbols or Hateful People? Predictive Features for Hate Speech Detection on Twitter , 2016, NAACL.

[21]  Thomas Wolf,et al.  DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter , 2019, ArXiv.

[22]  Yejin Choi,et al.  The Risk of Racial Bias in Hate Speech Detection , 2019, ACL.

[23]  Dirk Hovy,et al.  Learning Whom to Trust with MACE , 2013, NAACL.

[24]  Thomas Wolf,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[25]  Ingmar Weber,et al.  Understanding Abuse: A Typology of Abusive Language Detection Subtasks , 2017, ALW@ACL.

[26]  Lluis Gomez,et al.  Exploring Hate Speech Detection in Multimodal Publications , 2019, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).

[27]  Douwe Kiela,et al.  The Hateful Memes Challenge: Detecting Hate Speech in Multimodal Memes , 2020, NeurIPS.

[28]  Yejin Choi,et al.  Social Bias Frames: Reasoning about Social and Power Implications of Language , 2020, ACL.

[29]  Amanda Lenhart,et al.  Online Harassment, Digital Abuse, and Cyberstalking in America , 2016 .

[30]  Bernard J. Jansen,et al.  Online Hate Ratings Vary by Extremes: A Statistical Analysis , 2019, CHIIR.

[31]  Paula Fortuna,et al.  Toxic, Hateful, Offensive or Abusive? What Are We Really Classifying? An Empirical Analysis of Hate Speech Datasets , 2020, LREC.

[32]  Jing Qian,et al.  A Benchmark Dataset for Learning to Intervene in Online Hate Speech , 2019, EMNLP.

[33]  Susan Benesch,et al.  Dangerous speech and dangerous ideology: an integrated model for monitoring and prevention , 2016 .

[34]  Noah A. Smith,et al.  The Media Frames Corpus: Annotations of Frames Across Issues , 2015, ACL.

[35]  Sérgio Nunes,et al.  A Survey on Automatic Detection of Hate Speech in Text , 2018, ACM Comput. Surv..

[36]  Williamson,et al.  SLURS AND STEREOTYPES , 2013 .

[37]  Ingmar Weber,et al.  Racial Bias in Hate Speech and Abusive Language Detection Datasets , 2019, Proceedings of the Third Workshop on Abusive Language Online.

[38]  Preslav Nakov,et al.  SemEval-2020 Task 12: Multilingual Offensive Language Identification in Social Media (OffensEval 2020) , 2020, SEMEVAL.

[39]  Timothy Baldwin,et al.  langid.py: An Off-the-shelf Language Identification Tool , 2012, ACL.

[40]  Nick Seaver The nice thing about context is that everyone has it , 2015 .

[41]  Ingmar Weber,et al.  Automated Hate Speech Detection and the Problem of Offensive Language , 2017, ICWSM.

[42]  Grigorios Tsoumakas,et al.  Multi-Label Classification: An Overview , 2007, Int. J. Data Warehous. Min..

[43]  J. Nathan Matias,et al.  Caveat emptor, computational social science: Large-scale missing data in a widely-published Reddit corpus , 2018, PloS one.

[44]  Rob Procter,et al.  A Study of Cyber Hate on Twitter with Implications for Social Media Governance Strategies , 2019, TTO.

[45]  Ritesh Kumar,et al.  Benchmarking Aggression Identification in Social Media , 2018, TRAC@COLING 2018.

[46]  Lei Gao,et al.  Detecting Online Hate Speech Using Context Aware Models , 2017, RANLP.

[47]  Ali Farhadi,et al.  Fine-Tuning Pretrained Language Models: Weight Initializations, Data Orders, and Early Stopping , 2020, ArXiv.

[48]  Lucas Dixon,et al.  Ex Machina: Personal Attacks Seen at Scale , 2016, WWW.

[49]  Cody Buntain,et al.  A Large Labeled Corpus for Online Harassment Research , 2017, WebSci.

[50]  David Jurgens,et al.  A Just and Comprehensive Strategy for Using NLP to Address Online Abuse , 2019, ACL.

[51]  M. Williams,et al.  Hatred behind the screens: A report on the rise of online hate speech , 2019 .

[52]  Patrícia Rossini,et al.  Beyond Incivility: Understanding Patterns of Uncivil and Intolerant Discourse in Online Political Talk , 2020, Communication Research.

[53]  Bernard J. Jansen,et al.  Online Hate Interpretation Varies by Country, But More by Individual: A Statistical Analysis Using Crowdsourced Ratings , 2018, 2018 Fifth International Conference on Social Networks Analysis, Management and Security (SNAMS).

[54]  Alice E. Marwick,et al.  Online Harassment, Defamation, and Hateful Speech: A Primer of the Legal Landscape , 2014 .

[55]  Dit-Yan Yeung,et al.  Comparative Evaluation of Label Agnostic Selection Bias in Multilingual Hate Speech Datasets , 2020, EMNLP.

[56]  Leon Derczynski,et al.  Directions in Abusive Language Training Data: Garbage In, Garbage Out , 2020, ArXiv.

[57]  Michael Wiegand,et al.  A Survey on Hate Speech Detection using Natural Language Processing , 2017, SocialNLP@EACL.

[58]  Gianluca Stringhini,et al.  Large Scale Crowdsourcing and Characterization of Twitter Abusive Behavior , 2018, ICWSM.

[59]  Taha Yasseri,et al.  Detecting weak and strong Islamophobic hate speech on social media , 2018, Journal of Information Technology & Politics.

[60]  Scott A. Hale,et al.  Challenges and frontiers in abusive content detection , 2019, Proceedings of the Third Workshop on Abusive Language Online.

[61]  John Pavlopoulos,et al.  Toxicity Detection: Does Context Really Matter? , 2020, ACL.

[62]  Scott A. Hale,et al.  Detecting East Asian Prejudice on Social Media , 2020, ALW.

[63]  Paolo Rosso,et al.  Automatic Identification and Classification of Misogynistic Language on Twitter , 2018, NLDB.

[64]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.