Toxic Comment Detection in Online Discussions

Comment sections of online news platforms are an essential space to express opinions and discuss political topics. In contrast to other online posts, news discussions are related to particular news articles, comments refer to each other, and individual conversations emerge. However, the misuse by spammers, haters, and trolls makes costly content moderation necessary. Sentiment analysis can not only support moderation but also help to understand the dynamics of online discussions. A subtask of content moderation is the identification of toxic comments. To this end, we describe the concept of toxicity and characterize its subclasses. Further, we present various deep learning approaches, including datasets and architectures, tailored to sentiment analysis in online discussions. One way to make these approaches more comprehensible and trustworthy is fine-grained instead of binary comment classification. On the downside, more classes require more training data. Therefore, we propose to augment training data by using transfer learning. We discuss real-world applications, such as semi-automated comment moderation and troll detection. Finally, we outline future challenges and current limitations in light of most recent research publications.

[1]  Yiannis Kompatsiaris,et al.  Predicting News Popularity by Mining Online Discussions , 2016, WWW.

[2]  Klaus-Robert Müller,et al.  iNNvestigate neural networks! , 2018, J. Mach. Learn. Res..

[3]  Joel R. Tetreault,et al.  Finding Good Conversations Online: The Yahoo News Annotated Comments Corpus , 2017, LAW@ACL.

[4]  Björn Gambäck,et al.  Using Convolutional Neural Networks to Classify Hate-Speech , 2017, ALW@ACL.

[5]  Paul Resnick,et al.  Slash(dot) and burn: distributed moderation in a large online conversation space , 2004, CHI.

[6]  Maite Taboada,et al.  Using New York Times Picks to Identify Constructive Comments , 2017, NLPmJ@EMNLP.

[7]  Vicenç Gómez,et al.  Statistical analysis of the social network and discussion threads in slashdot , 2008, WWW.

[8]  Sean J. Taylor,et al.  Discussion Quality Diffuses in the Digital Public Square , 2017, WWW.

[9]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[10]  Cindy Wang Interpreting Neural Network Hate Speech Classifiers , 2018, ALW.

[11]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[12]  Ingmar Weber,et al.  Understanding Abuse: A Typology of Abusive Language Detection Subtasks , 2017, ALW@ACL.

[13]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[14]  Klaus-Robert Müller,et al.  Explaining Recurrent Neural Network Predictions in Sentiment Analysis , 2017, WASSA@EMNLP.

[15]  Nicholas Diakopoulos Picking the NYT Picks : Editorial Criteria and Automation in the Curation of Online News , 2015 .

[16]  David Robinson,et al.  Detecting Hate Speech on Twitter Using a Convolution-GRU Based Deep Neural Network , 2018, ESWC.

[17]  Dietmar Schabus,et al.  Academic-Industrial Perspective on the Development and Deployment of a Moderation System for a Newspaper Website , 2018, LREC.

[18]  Ralf Krestel,et al.  Delete or not Delete? Semi-Automatic Comment Moderation for the Newsroom , 2018, TRAC@COLING 2018.

[19]  Klaus-Robert Müller,et al.  Learning how to explain neural networks: PatternNet and PatternAttribution , 2017, ICLR.

[20]  Jing Zhou,et al.  Hate Speech Detection with Comment Embeddings , 2015, WWW.

[21]  Mai ElSherief,et al.  Leveraging Intra-User and Inter-User Representation Learning for Automated Hate Speech Detection , 2018, NAACL.

[22]  Igor Santos,et al.  Supervised machine learning for the detection of troll profiles in twitter social network: application to a real case of cyberbullying , 2015, Log. J. IGPL.

[23]  Joel R. Tetreault,et al.  Automatically Identifying Good Conversations Online (Yes, They Do Exist!) , 2017, ICWSM.

[24]  Alex Graves,et al.  Recurrent Models of Visual Attention , 2014, NIPS.

[25]  C. Hardaker,et al.  Trolling in asynchronous computer-mediated communication: From user discussions to academic definitions , 2010 .

[26]  Ziqi Zhang,et al.  Hate Speech Detection: A Solved Problem? The Challenging Case of Long Tail on Twitter , 2018, Semantic Web.

[27]  Maite Taboada,et al.  Constructive Language in News Comments , 2017, ALW@ACL.

[28]  Klaus-Robert Müller,et al.  Explaining Predictions of Non-Linear Classifiers in NLP , 2016, Rep4NLP@ACL.

[29]  Heri Ramampiaro,et al.  Effective hate-speech detection in Twitter data using recurrent neural networks , 2018, Applied Intelligence.

[30]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[31]  Pascale Fung,et al.  One-step and Two-step Classification for Abusive Language Detection on Twitter , 2017, ALW@ACL.

[32]  Lucas Dixon,et al.  Ex Machina: Personal Attacks Seen at Scale , 2016, WWW.

[33]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[34]  Ralf Krestel,et al.  Aggression Identification Using Deep Learning and Data Augmentation , 2018, TRAC@COLING 2018.

[35]  Alex Graves,et al.  Generating Sequences With Recurrent Neural Networks , 2013, ArXiv.

[36]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[37]  Ralf Krestel,et al.  Measuring and Facilitating Data Repeatability in Web Science , 2019, Datenbank-Spektrum.

[38]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[39]  Libby Hemphill,et al.  Quantifying Toxicity and Verbal Violence on Twitter , 2016, CSCW Companion.

[40]  Sebastian Ruder,et al.  Universal Language Model Fine-tuning for Text Classification , 2018, ACL.

[41]  John Pavlopoulos,et al.  Deeper Attention to Abusive User Content Moderation , 2017, EMNLP.

[42]  Joel R. Tetreault,et al.  Abusive Language Detection in Online User Content , 2016, WWW.

[43]  Ingmar Weber,et al.  Automated Hate Speech Detection and the Problem of Offensive Language , 2017, ICWSM.

[44]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[45]  Vasudeva Varma,et al.  Deep Learning for Hate Speech Detection in Tweets , 2017, WWW.

[46]  Ralf Krestel,et al.  Fine-Grained Classification of Offensive Language , 2018 .

[47]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[48]  Dirk Hovy,et al.  Hateful Symbols or Hateful People? Predictive Features for Hate Speech Detection on Twitter , 2016, NAACL.

[49]  Neil Shah,et al.  False Information on Web and Social Media: A Survey , 2018, ArXiv.

[50]  Mor Naaman,et al.  Towards quality discourse in online news comments , 2011, CSCW.

[51]  Michael Wiegand,et al.  A Survey on Hate Speech Detection using Natural Language Processing , 2017, SocialNLP@EACL.

[52]  John Pavlopoulos,et al.  Improved Abusive Comment Moderation with User Embeddings , 2017, NLPmJ@EMNLP.

[53]  Alexander Binder,et al.  On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation , 2015, PloS one.

[54]  Martin Trapp,et al.  One Million Posts: A Data Set of German Online Discussions , 2017, SIGIR.

[55]  Yi Li,et al.  Is This Post Persuasive? Ranking Argumentative Comments in Online Forum , 2016, ACL.

[56]  Ralf Krestel,et al.  Prediction for the Newsroom: Which Articles Will Get the Most Comments? , 2018, NAACL-HLT.

[57]  Niklas Elmqvist,et al.  Supporting Comment Moderators in Identifying High Quality Online News Comments , 2016, CHI.

[58]  Ralf Krestel,et al.  Challenges for Toxic Comment Classification: An In-Depth Error Analysis , 2018, ALW.

[59]  Zeerak Waseem,et al.  Are You a Racist or Am I Seeing Things? Annotator Influence on Hate Speech Detection on Twitter , 2016, NLP+CSS@EMNLP.