论文信息 - Data Augmentation using Pre-trained Transformer Models

Data Augmentation using Pre-trained Transformer Models

Language model based pre-trained models such as BERT have provided significant gains across different NLP tasks. In this paper, we study different types of pre-trained transformer based models such as auto-regressive models (GPT-2), auto-encoder models (BERT), and seq2seq models (BART) for conditional data augmentation. We show that prepending the class labels to text sequences provides a simple yet effective way to condition the pre-trained models for data augmentation. On three classification benchmarks, pre-trained Seq2Seq model outperforms other models. Further, we explore how different pre-trained model based data augmentation differs in-terms of data diversity, and how well such methods preserve the class-label information.

Eunah Cho | Varun Kumar | Ashutosh Choudhary

[1] R. Swanson,et al. Evaluating Story Generation Systems Using Automated Linguistic Analyses , 2017 .

[2] Lav R. Varshney,et al. CTRL: A Conditional Transformer Language Model for Controllable Generation , 2019, ArXiv.

[3] Yejin Choi,et al. The Curious Case of Neural Text Degeneration , 2019, ICLR.

[4] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[5] Myle Ott,et al. Understanding Back-Translation at Scale , 2018, EMNLP.

[6] Alec Radford,et al. Improving Language Understanding by Generative Pre-Training , 2018 .

[7] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .

[8] Hadrien Glaude,et al. A Closer Look At Feature Space Data Augmentation For Few-Shot Intent Classification , 2019, EMNLP.

[9] R'emi Louf,et al. HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[10] Christopher Potts,et al. Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[11] Christiane Fellbaum,et al. Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.