LSTM Neural Networks for Transfer Learning in Online Moderation of Abuse Context

Recently, the impact of offensive language and derogatory speech to online discourse, motivated social media platforms to research effective moderation tools that safeguard internet access. However, automatically distilling and flagging inappropriate conversations for abuse remains a difficult and time consuming task. In this work, we propose an LSTM based neural model that transfers learning from a platform domain with a relatively large dataset to a domain much resource constraint, and improves the target performance of classifying toxic comments. Our model is pretrained on personal attack comments retrieved from a subset of discussions on Wikipedia, and tested to identify hate speech on annotated Twitter tweets. We achieved an F1 measure of 0.77, approaching performance of the in-domain model and outperforming out-domain baseline by about nine percentage points, without counseling the provided labels.

[1]  Alexei A. Efros,et al.  What makes ImageNet good for transfer learning? , 2016, ArXiv.

[2]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[3]  Rogers Prates de Pelle,et al.  Offensive Comments in the Brazilian Web: a dataset and baseline results , 2017 .

[4]  Franck Dernoncourt,et al.  Sequential Short-Text Classification with Recurrent and Convolutional Neural Networks , 2016, NAACL.

[5]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[6]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[7]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[8]  Yoshua Bengio,et al.  On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.

[9]  Julius Kunze,et al.  Transfer Learning for Speech Recognition on a Budget , 2017, Rep4NLP@ACL.

[10]  Lucas Dixon,et al.  Ex Machina: Personal Attacks Seen at Scale , 2016, WWW.

[11]  Yuzhou Wang,et al.  Locate the Hate: Detecting Tweets against Blacks , 2013, AAAI.

[12]  Bailey Poland,et al.  Haters: Harassment, Abuse, and Violence Online , 2016 .

[13]  Po-Sen Huang,et al.  Two-Stage Synthesis Networks for Transfer Learning in Machine Comprehension , 2017, EMNLP.

[14]  Björn Gambäck,et al.  Using Convolutional Neural Networks to Classify Hate-Speech , 2017, ALW@ACL.

[15]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[16]  Mark Sandler,et al.  Transfer Learning for Music Classification and Regression Tasks , 2017, ISMIR.

[17]  Elizabeth F. Churchill,et al.  Automatic identification of personal insults on social news sites , 2012, J. Assoc. Inf. Sci. Technol..

[18]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[19]  Theodore Chu,et al.  Comment Abuse Classification with Deep Learning , 2017 .

[20]  Deniz Yuret,et al.  Transfer Learning for Low-Resource Neural Machine Translation , 2016, EMNLP.

[21]  Girish Chowdhary,et al.  Cross-Domain Transfer in Reinforcement Learning Using Target Apprentice , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[22]  Xuanjing Huang,et al.  Adversarial Multi-task Learning for Text Classification , 2017, ACL.

[23]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[24]  Rui Yan,et al.  How Transferable are Neural Networks in NLP Applications? , 2016, EMNLP.

[25]  Minlie Huang,et al.  Modeling Rich Contexts for Sentiment Classification with LSTM , 2016, ArXiv.

[26]  Dirk Hovy,et al.  Hateful Symbols or Hateful People? Predictive Features for Hate Speech Detection on Twitter , 2016, NAACL.

[27]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[28]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[29]  Young-Bum Kim,et al.  Cross-Lingual Transfer Learning for POS Tagging without Cross-Lingual Resources , 2017, EMNLP.

[30]  Manoj Kumar Chinnakotla,et al.  Convolutional Bi-directional LSTM for Detecting Inappropriate Query Suggestions in Web Search , 2017, PAKDD.

[31]  Björn Ross,et al.  Measuring the Reliability of Hate Speech Annotations: The Case of the European Refugee Crisis , 2016, ArXiv.

[32]  Ivan Laptev,et al.  Learning and Transferring Mid-level Image Representations Using Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[33]  Barbara Plank,et al.  Learning to select data for transfer learning with Bayesian Optimization , 2017, EMNLP.

[34]  Xing Fan,et al.  Transfer Learning for Neural Semantic Parsing , 2017, Rep4NLP@ACL.

[35]  Derek Ruths,et al.  A Web of Hate: Tackling Hateful Speech in Online Social Spaces , 2017, ArXiv.

[36]  Chenhui Chu,et al.  An Empirical Comparison of Domain Adaptation Methods for Neural Machine Translation , 2017, ACL.

[37]  Sebastian Thrun,et al.  Dermatologist-level classification of skin cancer with deep neural networks , 2017, Nature.

[38]  Wang Ling,et al.  Generative and Discriminative Text Classification with Recurrent Neural Networks , 2017, ArXiv.

[39]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[40]  John Pavlopoulos,et al.  Deep Learning for User Comment Moderation , 2017, ALW@ACL.

[41]  Tomaz Erjavec,et al.  Legal Framework, Dataset and Annotation Schema for Socially Unacceptable Online Discourse Practices in Slovene , 2017, ALW@ACL.

[42]  Cícero Nogueira dos Santos,et al.  Deep Convolutional Neural Networks for Sentiment Analysis of Short Texts , 2014, COLING.

[43]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[44]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[45]  Brian D. Davison,et al.  Detection of Harassment on Web 2.0 , 2009 .

[46]  Lee Rainie,et al.  The future of free speech, trolls, anonymity and fake news online , 2017 .

[47]  Ingmar Weber,et al.  Automated Hate Speech Detection and the Problem of Offensive Language , 2017, ICWSM.

[48]  Kuldip K. Paliwal,et al.  Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..

[49]  Xiaoyan Zhu,et al.  Linguistically Regularized LSTM for Sentiment Classification , 2016, ACL.

[50]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[51]  Trevor Darrell,et al.  Long-term recurrent convolutional networks for visual recognition and description , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).