论文信息 - Towards Zero-Label Language Learning - 字舞流文

Towards Zero-Label Language Learning

This paper explores zero-label learning in Natural Language Processing (NLP), whereby no human-annotated data is used anywhere during training and models are trained purely on synthetic data. At the core of our framework is a novel approach for better leveraging the powerful pretrained language models. Specifically, inspired by the recent success of fewshot inference on GPT-3, we present a training data creation procedure named Unsupervised Data Generation (UDG), which leverages fewshot prompts to synthesize high-quality training data without real human annotations. Our method enables zero-label learning as we train task-specific models solely on the synthetic data, yet we achieve better or comparable results from strong baseline models trained on human-labeled data. Furthermore, when mixed with labeled data, our approach serves as a highly effective data augmentation procedure, achieving new state-of-the-art results on the SuperGLUE benchmark1.

Orhan Firat | Adams Wei Yu | Yuan Cao | Zirui Wang | Orhan Firat | Yuan Cao | Zirui Wang | Yuan Cao | A. Yu | Zirui Wang

[1] Colin Raffel,et al. Improving and Simplifying Pattern Exploiting Training , 2021, EMNLP.

[2] Kenton Lee,et al. Neural Data Augmentation via Example Extrapolation , 2021, ArXiv.

[3] Colin Raffel,et al. Extracting Training Data from Large Language Models , 2020, USENIX Security Symposium.

[4] Hinrich Schütze,et al. It’s Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners , 2020, NAACL.

[5] Dawn Song,et al. Language Models are Open Knowledge Graphs , 2020, ArXiv.

[6] Tommi Grondahl,et al. A little goes a long way: Improving toxic language classification despite data scarcity , 2020, FINDINGS.

[7] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.

[8] Yannis Papanikolaou,et al. DARE: Data Augmented Relation Extraction with GPT-2 , 2020, ArXiv.

[9] Eunah Cho,et al. Data Augmentation using Pre-trained Transformer Models , 2020, LIFELONGNLP.

[10] Colin Raffel,et al. How Much Knowledge Can You Pack into the Parameters of a Language Model? , 2020, EMNLP.

[11] Quoc V. Le,et al. Towards a Human-like Open-Domain Chatbot , 2020, ArXiv.

[12] Colin Raffel,et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..

[13] Quoc V. Le,et al. Unsupervised Data Augmentation for Consistency Training , 2019, NeurIPS.

[14] Ateret Anaby-Tavor,et al. Do Not Have Enough Data? Deep Learning to the Rescue! , 2019, AAAI.

[15] Sebastian Riedel,et al. Language Models as Knowledge Bases? , 2019, EMNLP.

[16] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[17] Yiming Yang,et al. XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.

[18] Omer Levy,et al. SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems , 2019, NeurIPS.

[19] Kai Zou,et al. EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks , 2019, EMNLP.

[20] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[21] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .

[22] Quoc V. Le,et al. QANet: Combining Local Convolution with Global Self-Attention for Reading Comprehension , 2018, ICLR.

[23] Xiang Zhang,et al. Character-level Convolutional Networks for Text Classification , 2015, NIPS.

[24] Christopher Potts,et al. Learning Word Vectors for Sentiment Analysis , 2011, ACL.