A Unified Taxonomy of Harmful Content

The ability to recognize harmful content within online communities has come into focus for researchers, engineers and policy makers seeking to protect users from abuse. While the number of datasets aiming to capture forms of abuse has grown in recent years, the community has not standardized around how various harmful behaviors are defined, creating challenges for reliable moderation, modeling and evaluation. As a step towards attaining shared understanding of how online abuse may be modeled, we synthesize the most common types of abuse described by industry, policy, community and health experts into a unified typology of harmful content, with detailed criteria and exceptions for each type of abuse.

[1]  Louis-Philippe Morency,et al.  Combating Human Trafficking with Deep Multimodal Models. , 2017 .

[2]  Lucas Dixon,et al.  Ex Machina: Personal Attacks Seen at Scale , 2016, WWW.

[3]  Michael Wiegand,et al.  A Survey on Hate Speech Detection using Natural Language Processing , 2017, SocialNLP@EACL.

[4]  Walter Daelemans,et al.  Detection and Fine-Grained Classification of Cyberbullying Events , 2015, RANLP.

[5]  Ingmar Weber,et al.  Automated Hate Speech Detection and the Problem of Offensive Language , 2017, ICWSM.

[6]  Artur Dubrawski,et al.  Leveraging Publicly Available Data to Discern Patterns of Human-Trafficking Activity , 2015 .

[7]  Paolo Rosso,et al.  Automatic Identification and Classification of Misogynistic Language on Twitter , 2018, NLDB.

[8]  Michael Wiegand,et al.  Detection of Abusive Language: the Problem of Biased Datasets , 2019, NAACL.

[9]  Chris Kanich,et al.  Fifteen minutes of unwanted fame: detecting and characterizing doxing , 2017, Internet Measurement Conference.

[10]  D. Sugarman,et al.  The Revised Conflict Tactics Scales (CTS2) , 1996 .

[11]  Klaus Krippendorff,et al.  Content Analysis: An Introduction to Its Methodology , 1980 .

[12]  Benjamin Fabian,et al.  Detecting child sexual abuse material: A comprehensive survey , 2020, Digit. Investig..

[13]  Pete Burnap,et al.  Machine Classification and Analysis of Suicide-Related Communication on Twitter , 2015, HT.

[14]  Dirk Hovy,et al.  Hateful Symbols or Hateful People? Predictive Features for Hate Speech Detection on Twitter , 2016, NAACL.

[15]  Scott A. Hale,et al.  Challenges and frontiers in abusive content detection , 2019, Proceedings of the Third Workshop on Abusive Language Online.

[16]  M. Choudhury,et al.  "This Post Will Just Get Taken Down": Characterizing Removed Pro-Eating Disorder Social Media Content , 2016, CHI.

[17]  Ingmar Weber,et al.  Understanding Abuse: A Typology of Abusive Language Detection Subtasks , 2017, ALW@ACL.

[18]  Chu-Ren Huang,et al.  Motivations, Methods and Metrics of Misinformation Detection: An NLP Perspective , 2020, Natural Language Processing Research.

[19]  Björn Ross,et al.  Measuring the Reliability of Hate Speech Annotations: The Case of the European Refugee Crisis , 2016, ArXiv.

[20]  Markus Brede,et al.  Detecting and Characterizing Eating-Disorder Communities on Social Media , 2017, WSDM.

[21]  Xiaohao He,et al.  Latent Suicide Risk Detection on Microblog via Suicide-Oriented Word Embeddings and Layered Attention , 2019, EMNLP.

[22]  Leon Derczynski,et al.  Directions in Abusive Language Training Data: Garbage In, Garbage Out , 2020, ArXiv.

[23]  Zeerak Waseem,et al.  Are You a Racist or Am I Seeing Things? Annotator Influence on Hate Speech Detection on Twitter , 2016, NLP+CSS@EMNLP.