A little goes a long way: Improving toxic language classification despite data scarcity

Detection of some types of toxic language is hampered by extreme scarcity of labeled training data. Data augmentation - generating new synthetic data from a labeled seed dataset - can help. The efficacy of data augmentation on toxic language classification has not been fully explored. We present the first systematic study on how data augmentation techniques impact performance across toxic language classifiers, ranging from shallow logistic regression architectures to BERT - a state-of-the-art pre-trained Transformer network. We compare the performance of eight techniques on very scarce seed datasets. We show that while BERT performed the best, shallow classifiers performed comparably when trained on data augmented with a combination of three techniques, including GPT-2-generated sentences. We discuss the interplay of performance and computational overhead, which can inform the choice of techniques under different constraints.

[1]  Michael E. Lesk,et al.  Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone , 1986, SIGDOC '86.

[2]  Christoph Lofi,et al.  Training Data Augmentation for Detecting Adverse Drug Reactions in User-Generated Content , 2019, EMNLP.

[3]  Chris Callison-Burch,et al.  PPDB: The Paraphrase Database , 2013, NAACL.

[4]  Bernt Schiele,et al.  A4NT: Author Attribute Anonymity by Adversarial Training of Neural Machine Translation , 2017, USENIX Security Symposium.

[5]  Animesh Mukherjee,et al.  Spread of Hate Speech in Online Social Media , 2018, WebSci.

[6]  Lucas Dixon,et al.  Ex Machina: Personal Attacks Seen at Scale , 2016, WWW.

[7]  Jing Qian,et al.  A Benchmark Dataset for Learning to Intervene in Online Hate Speech , 2019, EMNLP.

[8]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[9]  Nitin Madnani,et al.  Generating Phrasal and Sentential Paraphrases: A Survey of Data-Driven Methods , 2010, CL.

[10]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[11]  W. Wong,et al.  The calculation of posterior distributions by data augmentation , 1987 .

[12]  Michael Wiegand,et al.  A Survey on Hate Speech Detection using Natural Language Processing , 2017, SocialNLP@EACL.

[13]  Padmini Srinivasan,et al.  A Girl Has No Name: Automated Authorship Obfuscation using Mutant-X , 2019, Proc. Priv. Enhancing Technol..

[14]  D. Freedman,et al.  Asymptotic Normality and the Bootstrap in Stratified Sampling , 1984 .

[15]  Adam Lopez,et al.  A systematic comparison of methods for low-resource dependency parsing on genuinely low-resource languages , 2019, EMNLP/IJCNLP.

[16]  Kevin P. Murphy,et al.  Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[17]  Jörg Tiedemann,et al.  OpenSubtitles2016: Extracting Large Parallel Corpora from Movie and TV Subtitles , 2016, LREC.

[18]  Hang Li,et al.  Paraphrase Generation with Deep Reinforcement Learning , 2017, EMNLP.

[19]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[20]  Roberto Navigli,et al.  Word sense disambiguation: A survey , 2009, CSUR.

[21]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[22]  Chris Callison-Burch,et al.  PPDB 2.0: Better paraphrase ranking, fine-grained entailment relations, word embeddings, and style classification , 2015, ACL.

[23]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[24]  Christopher D. Manning,et al.  Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[25]  Fan Zhang,et al.  Stealing Machine Learning Models via Prediction APIs , 2016, USENIX Security Symposium.

[26]  Philip S. Yu,et al.  Adv-BERT: BERT is not robust on misspellings! Generating nature adversarial samples on BERT , 2020, ArXiv.

[27]  Ralf Krestel,et al.  Challenges for Toxic Comment Classification: An In-Depth Error Analysis , 2018, ALW.

[28]  Wiebke Wagner,et al.  Steven Bird, Ewan Klein and Edward Loper: Natural Language Processing with Python, Analyzing Text with the Natural Language Toolkit , 2010, Lang. Resour. Evaluation.

[29]  Taku Kudo,et al.  SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing , 2018, EMNLP.

[30]  Jungyun Seo,et al.  ThisIsCompetition at SemEval-2019 Task 9: BERT is unstable for out-of-domain samples , 2019, SemEval@NAACL-HLT.

[31]  Eric P. Xing,et al.  Toward Controlled Generation of Text , 2017, ICML.

[32]  Emily Ahn,et al.  Finding Microaggressions in the Wild: A Case for Locating Elusive Phenomena in Social Media Posts , 2019, EMNLP.

[33]  Qun Liu,et al.  TinyBERT: Distilling BERT for Natural Language Understanding , 2020, EMNLP.

[34]  Christopher Ré,et al.  Learning to Compose Domain-Specific Transformations for Data Augmentation , 2017, NIPS.

[35]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[36]  Vasudeva Varma,et al.  Deep Learning for Hate Speech Detection in Tweets , 2017, WWW.

[37]  Sebastian Riedel,et al.  Language Models as Knowledge Bases? , 2019, EMNLP.

[38]  Regina Barzilay,et al.  Style Transfer from Non-Parallel Text by Cross-Alignment , 2017, NIPS.

[39]  Erik Cambria,et al.  Recent Trends in Deep Learning Based Natural Language Processing , 2017, IEEE Comput. Intell. Mag..

[40]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[41]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[42]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[43]  Rada Mihalcea,et al.  Word Sense Disambiguation , 2015, Encyclopedia of Machine Learning.

[44]  Xiang Zhang,et al.  Character-level Convolutional Networks for Text Classification , 2015, NIPS.

[45]  Taghi M. Khoshgoftaar,et al.  A survey on Image Data Augmentation for Deep Learning , 2019, Journal of Big Data.

[46]  Christof Monz,et al.  Data Augmentation for Low-Resource Neural Machine Translation , 2017, ACL.

[47]  Christopher Potts,et al.  A large annotated corpus for learning natural language inference , 2015, EMNLP.

[48]  Boran Sahindal,et al.  Detecting hate speech on Twitter , 2017 .

[49]  Benjamin Heinzerling,et al.  BPEmb: Tokenization-free Pre-trained Subword Embeddings in 275 Languages , 2017, LREC.

[50]  Thomas Wolf,et al.  DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter , 2019, ArXiv.

[51]  R'emi Louf,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[52]  Mark D. McDonnell,et al.  Understanding Data Augmentation for Classification: When to Warp? , 2016, 2016 International Conference on Digital Image Computing: Techniques and Applications (DICTA).

[53]  Daniel Jurafsky,et al.  Data Noising as Smoothing in Neural Network Language Models , 2017, ICLR.

[54]  Diyi Yang,et al.  That’s So Annoying!!!: A Lexical and Frame-Semantic Embedding Based Data Augmentation Approach to Automatic Categorization of Annoying Behaviors using #petpeeve Tweets , 2015, EMNLP.

[55]  Petr Sojka,et al.  Software Framework for Topic Modelling with Large Corpora , 2010 .

[56]  Graham Neubig,et al.  Generalized Data Augmentation for Low-Resource Translation , 2019, ACL.

[57]  Dirk Hovy,et al.  Hateful Symbols or Hateful People? Predictive Features for Hate Speech Detection on Twitter , 2016, NAACL.

[58]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[59]  David Robinson,et al.  Detecting Hate Speech on Twitter Using a Convolution-GRU Based Deep Neural Network , 2018, ESWC.

[60]  Omer Levy,et al.  BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension , 2019, ACL.

[61]  Kai Zou,et al.  EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks , 2019, EMNLP.

[62]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[63]  Ingmar Weber,et al.  Automated Hate Speech Detection and the Problem of Offensive Language , 2017, ICWSM.

[64]  Vitaly Shmatikov,et al.  Humpty Dumpty: Controlling Word Meanings via Corpus Poisoning , 2020, 2020 IEEE Symposium on Security and Privacy (SP).

[65]  Mauro Conti,et al.  All You Need is "Love": Evading Hate Speech Detection , 2018, ArXiv.

[66]  P. Glasserman,et al.  Some Guidelines and Guarantees for Common Random Numbers , 1992 .

[67]  George Kurian,et al.  Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.