Generating Counter Narratives against Online Hate Speech: Data and Strategies

Recently research has started focusing on avoiding undesired effects that come with content moderation, such as censorship and overblocking, when dealing with hatred online. The core idea is to directly intervene in the discussion with textual responses that are meant to counter the hate content and prevent it from further spreading. Accordingly, automation strategies, such as natural language generation, are beginning to be investigated. Still, they suffer from the lack of sufficient amount of quality data and tend to produce generic/repetitive responses. Being aware of the aforementioned limitations, we present a study on how to collect responses to hate effectively, employing large scale unsupervised language models such as GPT-2 for the generation of silver data, and the best annotation strategies/neural architectures that can be used for data filtering before expert validation/post-editing.

[1]  Shivakant Mishra,et al.  Prediction of Cyberbullying Incidents on the Instagram Social Network , 2015, ArXiv.

[2]  Michael Wiegand,et al.  A Survey on Hate Speech Detection using Natural Language Processing , 2017, SocialNLP@EACL.

[3]  Ke Wang,et al.  SentiGAN: Generating Sentimental Texts via Mixture Adversarial Networks , 2018, IJCAI.

[4]  Lu Wang,et al.  Argument Generation with Retrieval, Planning, and Realization , 2019, ACL.

[5]  Carolyn Penstein Rosé,et al.  Detecting offensive tweets via topical feature discovery over a large scale twitter corpus , 2012, CIKM.

[6]  Animesh Mukherjee,et al.  Thou shalt not hate: Countering Online Hate Speech , 2018, ICWSM.

[7]  Fabrício Benevenuto,et al.  Analyzing the Targets of Hate in Online Social Media , 2016, ICWSM.

[8]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[9]  Julia Hirschberg,et al.  Detecting Hate Speech on the World Wide Web , 2012 .

[10]  Marcello Federico,et al.  Coping with the Subjectivity of Human Judgements in MT Quality Estimation , 2013, WMT@ACL.

[11]  Mauro Cettolo,et al.  The repetition rate of text as a predictor of the effectiveness of machine translation adaptation , 2014, AMTA.

[12]  Jianfeng Gao,et al.  A Diversity-Promoting Objective Function for Neural Conversation Models , 2015, NAACL.

[13]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[14]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[15]  Zeerak Waseem,et al.  Are You a Racist or Am I Seeing Things? Annotator Influence on Hate Speech Detection on Twitter , 2016, NLP+CSS@EMNLP.

[16]  Felice Dell'Orletta,et al.  Hate Me, Hate Me Not: Hate Speech Detection on Facebook , 2017, ITASEC.

[17]  Matthew Leighton Williams,et al.  Cyber Hate Speech on Twitter: An Application of Machine Classification and Statistical Modeling for Policy and Decision Making , 2015 .

[18]  Xu Sun,et al.  Diversity-Promoting GAN: A Cross-Entropy Based Generative Adversarial Network for Diversified Text Generation , 2018, EMNLP.

[19]  Ritesh Kumar,et al.  Benchmarking Aggression Identification in Social Media , 2018, TRAC@COLING 2018.

[20]  Yejin Choi,et al.  The Curious Case of Neural Text Degeneration , 2019, ICLR.

[21]  Louis-Léon Christians,et al.  Expert workshop on the prohibition of incitement to national, racial or religious hatred , 2011 .

[22]  Chris Brew,et al.  Stochastic text generation , 2000, Philosophical Transactions of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences.

[23]  Scott R. Stroud,et al.  The Varieties of Feminist Counterspeech in the Misogynistic Online World , 2018 .

[24]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[25]  Animesh Mukherjee,et al.  Analyzing the hate and counter speech accounts on Twitter , 2018, ArXiv.

[26]  Njagi Dennis Gitari,et al.  A Lexicon-based Approach for Hate Speech Detection , 2015, MUE 2015.

[27]  Dirk Hovy,et al.  Hateful Symbols or Hateful People? Predictive Features for Hate Speech Detection on Twitter , 2016, NAACL.

[28]  Kevin Gimpel,et al.  ALBERT: A Lite BERT for Self-supervised Learning of Language Representations , 2019, ICLR.

[29]  Mauro Cettolo,et al.  Cache-based Online Adaptation for Machine Translation Enhanced Computer Assisted Translation , 2013, MTSUMMIT.

[30]  Sergey I. Nikolenko,et al.  Large-Scale Transfer Learning for Natural Language Generation , 2019, ACL.

[31]  Lucia Specia,et al.  Estimating Machine Translation Post-Editing Effort with HTER , 2010, JEC.

[32]  B. Richards Type/Token Ratios: what do they really tell us? , 1987, Journal of Child Language.

[33]  Kevin Munger Tweetment Effects on the Tweeted: Experimentally Reducing Racist Harassment , 2017 .

[34]  Sara Tonelli,et al.  Creating a WhatsApp Dataset to Study Pre-teen Cyberbullying , 2018, ALW.

[35]  Ona de Gibert,et al.  Hate Speech Dataset from a White Supremacy Forum , 2018, ALW.

[36]  Alex Kulesza,et al.  Confidence Estimation for Machine Translation , 2004, COLING.

[37]  Kilian Q. Weinberger,et al.  BERTScore: Evaluating Text Generation with BERT , 2019, ICLR.

[38]  Björn Ross,et al.  Measuring the Reliability of Hate Speech Annotations: The Case of the European Refugee Crisis , 2016, ArXiv.

[39]  Pete Burnap,et al.  Us and them: identifying cyber hate on Twitter across multiple protected characteristics , 2016, EPJ Data Science.

[40]  David Jurgens,et al.  A Just and Comprehensive Strategy for Using NLP to Address Online Abuse , 2019, ACL.

[41]  Sérgio Nunes,et al.  A Survey on Automatic Detection of Hate Speech in Text , 2018, ACM Comput. Surv..

[42]  Osmar R. Zaïane,et al.  Augmenting Neural Response Generation with Context-Aware Topical Attention , 2018, Proceedings of the First Workshop on NLP for Conversational AI.

[43]  Jing Qian,et al.  A Benchmark Dataset for Learning to Intervene in Online Hate Speech , 2019, EMNLP.

[44]  Marco Guerini,et al.  CONAN - COunter NArratives through Nichesourcing: a Multilingual Dataset of Responses to Fight Online Hate Speech , 2019, ACL.

[45]  Raquel Fernández,et al.  Examining a hate speech corpus for hate speech detection and popularity prediction , 2018, ArXiv.

[46]  Thomas Wolf,et al.  TransferTransfo: A Transfer Learning Approach for Neural Network Based Conversational Agents , 2019, ArXiv.