A little goes a long way: Improving toxic language classification despite data scarcity
暂无分享,去创建一个
Tommi Grondahl | Mika Juuti | N. Asokan | Adrian Flanagan | Adrian Flanagan | Mika Juuti | Tommi Grondahl | Nirmal Asokan
[1] Michael E. Lesk,et al. Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone , 1986, SIGDOC '86.
[2] Christoph Lofi,et al. Training Data Augmentation for Detecting Adverse Drug Reactions in User-Generated Content , 2019, EMNLP.
[3] Chris Callison-Burch,et al. PPDB: The Paraphrase Database , 2013, NAACL.
[4] Bernt Schiele,et al. A4NT: Author Attribute Anonymity by Adversarial Training of Neural Machine Translation , 2017, USENIX Security Symposium.
[5] Animesh Mukherjee,et al. Spread of Hate Speech in Online Social Media , 2018, WebSci.
[6] Lucas Dixon,et al. Ex Machina: Personal Attacks Seen at Scale , 2016, WWW.
[7] Jing Qian,et al. A Benchmark Dataset for Learning to Intervene in Online Hate Speech , 2019, EMNLP.
[8] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.
[9] Nitin Madnani,et al. Generating Phrasal and Sentential Paraphrases: A Survey of Data-Driven Methods , 2010, CL.
[10] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[11] W. Wong,et al. The calculation of posterior distributions by data augmentation , 1987 .
[12] Michael Wiegand,et al. A Survey on Hate Speech Detection using Natural Language Processing , 2017, SocialNLP@EACL.
[13] Padmini Srinivasan,et al. A Girl Has No Name: Automated Authorship Obfuscation using Mutant-X , 2019, Proc. Priv. Enhancing Technol..
[14] D. Freedman,et al. Asymptotic Normality and the Bootstrap in Stratified Sampling , 1984 .
[15] Adam Lopez,et al. A systematic comparison of methods for low-resource dependency parsing on genuinely low-resource languages , 2019, EMNLP/IJCNLP.
[16] Kevin P. Murphy,et al. Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.
[17] Jörg Tiedemann,et al. OpenSubtitles2016: Extracting Large Parallel Corpora from Movie and TV Subtitles , 2016, LREC.
[18] Hang Li,et al. Paraphrase Generation with Deep Reinforcement Learning , 2017, EMNLP.
[19] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.
[20] Roberto Navigli,et al. Word sense disambiguation: A survey , 2009, CSUR.
[21] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[22] Chris Callison-Burch,et al. PPDB 2.0: Better paraphrase ranking, fine-grained entailment relations, word embeddings, and style classification , 2015, ACL.
[23] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[24] Christopher D. Manning,et al. Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.
[25] Fan Zhang,et al. Stealing Machine Learning Models via Prediction APIs , 2016, USENIX Security Symposium.
[26] Philip S. Yu,et al. Adv-BERT: BERT is not robust on misspellings! Generating nature adversarial samples on BERT , 2020, ArXiv.
[27] Ralf Krestel,et al. Challenges for Toxic Comment Classification: An In-Depth Error Analysis , 2018, ALW.
[28] Wiebke Wagner,et al. Steven Bird, Ewan Klein and Edward Loper: Natural Language Processing with Python, Analyzing Text with the Natural Language Toolkit , 2010, Lang. Resour. Evaluation.
[29] Taku Kudo,et al. SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing , 2018, EMNLP.
[30] Jungyun Seo,et al. ThisIsCompetition at SemEval-2019 Task 9: BERT is unstable for out-of-domain samples , 2019, SemEval@NAACL-HLT.
[31] Eric P. Xing,et al. Toward Controlled Generation of Text , 2017, ICML.
[32] Emily Ahn,et al. Finding Microaggressions in the Wild: A Case for Locating Elusive Phenomena in Social Media Posts , 2019, EMNLP.
[33] Qun Liu,et al. TinyBERT: Distilling BERT for Natural Language Understanding , 2020, EMNLP.
[34] Christopher Ré,et al. Learning to Compose Domain-Specific Transformations for Data Augmentation , 2017, NIPS.
[35] Nitesh V. Chawla,et al. SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..
[36] Vasudeva Varma,et al. Deep Learning for Hate Speech Detection in Tweets , 2017, WWW.
[37] Sebastian Riedel,et al. Language Models as Knowledge Bases? , 2019, EMNLP.
[38] Regina Barzilay,et al. Style Transfer from Non-Parallel Text by Cross-Alignment , 2017, NIPS.
[39] Erik Cambria,et al. Recent Trends in Deep Learning Based Natural Language Processing , 2017, IEEE Comput. Intell. Mag..
[40] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[41] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .
[42] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.
[43] Rada Mihalcea,et al. Word Sense Disambiguation , 2015, Encyclopedia of Machine Learning.
[44] Xiang Zhang,et al. Character-level Convolutional Networks for Text Classification , 2015, NIPS.
[45] Taghi M. Khoshgoftaar,et al. A survey on Image Data Augmentation for Deep Learning , 2019, Journal of Big Data.
[46] Christof Monz,et al. Data Augmentation for Low-Resource Neural Machine Translation , 2017, ACL.
[47] Christopher Potts,et al. A large annotated corpus for learning natural language inference , 2015, EMNLP.
[48] Boran Sahindal,et al. Detecting hate speech on Twitter , 2017 .
[49] Benjamin Heinzerling,et al. BPEmb: Tokenization-free Pre-trained Subword Embeddings in 275 Languages , 2017, LREC.
[50] Thomas Wolf,et al. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter , 2019, ArXiv.
[51] R'emi Louf,et al. HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.
[52] Mark D. McDonnell,et al. Understanding Data Augmentation for Classification: When to Warp? , 2016, 2016 International Conference on Digital Image Computing: Techniques and Applications (DICTA).
[53] Daniel Jurafsky,et al. Data Noising as Smoothing in Neural Network Language Models , 2017, ICLR.
[54] Diyi Yang,et al. That’s So Annoying!!!: A Lexical and Frame-Semantic Embedding Based Data Augmentation Approach to Automatic Categorization of Annoying Behaviors using #petpeeve Tweets , 2015, EMNLP.
[55] Petr Sojka,et al. Software Framework for Topic Modelling with Large Corpora , 2010 .
[56] Graham Neubig,et al. Generalized Data Augmentation for Low-Resource Translation , 2019, ACL.
[57] Dirk Hovy,et al. Hateful Symbols or Hateful People? Predictive Features for Hate Speech Detection on Twitter , 2016, NAACL.
[58] George A. Miller,et al. WordNet: A Lexical Database for English , 1995, HLT.
[59] David Robinson,et al. Detecting Hate Speech on Twitter Using a Convolution-GRU Based Deep Neural Network , 2018, ESWC.
[60] Omer Levy,et al. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension , 2019, ACL.
[61] Kai Zou,et al. EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks , 2019, EMNLP.
[62] Rico Sennrich,et al. Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.
[63] Ingmar Weber,et al. Automated Hate Speech Detection and the Problem of Offensive Language , 2017, ICWSM.
[64] Vitaly Shmatikov,et al. Humpty Dumpty: Controlling Word Meanings via Corpus Poisoning , 2020, 2020 IEEE Symposium on Security and Privacy (SP).
[65] Mauro Conti,et al. All You Need is "Love": Evading Hate Speech Detection , 2018, ArXiv.
[66] P. Glasserman,et al. Some Guidelines and Guarantees for Common Random Numbers , 1992 .
[67] George Kurian,et al. Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.