论文信息 - Generate, Annotate, and Learn: NLP with Synthetic Text

Generate, Annotate, and Learn: NLP with Synthetic Text

Semi-Supervised Learning (SSL) has seen success in many application domains, but this success often hinges on the availability of task-specific unlabeled data. Knowledge distillation (KD) has enabled effective optimization of compact neural nets, achieving the best results when the knowledge of an expensive network is distilled via fresh task-specific unlabeled data. However, task-specific unlabeled data can be challenging to find, especially for NLP. We investigate the use of generative models in synthesizing unlabeled data and present a simple and general framework called “generate, annotate, and learn (GAL)”. A language model (LM) is used to synthesize in-domain unlabeled data. Then, a classifier is used to annotate such data. Finally, synthetically generated and annotated data is used to advance SSL, KD, and few-shot learning on NLP and tabular tasks. To obtain a strong task-specific LM, we either fine-tune a large LM on inputs from a specific task, or prompt a large LM with a few input examples and conditionally generate more unlabeled examples. It also yields a new state-of-the-art for 6-layer transformers on the GLUE leaderboard. Finally, self-training with GAL offers large gains on four tabular tasks from the UCI repository.

Gholamreza Haffari | Mohammad Norouzi | Jamie Kiros | Islam Nassar | Xuanli He

[1] Wojciech Zaremba,et al. Improved Techniques for Training GANs , 2016, NIPS.

[2] Vishrav Chaudhary,et al. Self-training Improves Pre-training for Natural Language Understanding , 2020, NAACL.

[3] Quoc V. Le,et al. Self-Training With Noisy Student Improves ImageNet Classification , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4] Yu Sun,et al. ERNIE: Enhanced Representation through Knowledge Integration , 2019, ArXiv.

[5] Ali Farhadi,et al. Defending Against Neural Fake News , 2019, NeurIPS.

[6] Eugene Charniak,et al. Effective Self-Training for Parsing , 2006, NAACL.

[7] Colin Raffel,et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..

[8] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[9] Jeff Donahue,et al. Large Scale GAN Training for High Fidelity Natural Image Synthesis , 2018, ICLR.

[10] Eunah Cho,et al. Data Augmentation using Pre-trained Transformer Models , 2020, LIFELONGNLP.

[11] Kevin Gimpel,et al. A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks , 2016, ICLR.

[12] Omer Levy,et al. SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems , 2019, NeurIPS.

[13] Cristian Canton-Ferrer,et al. The Deepfake Detection Challenge (DFDC) Preview Dataset , 2019, ArXiv.

[14] Yejin Choi,et al. PIQA: Reasoning about Physical Commonsense in Natural Language , 2019, AAAI.

[15] Holger H. Hoos,et al. A survey on semi-supervised learning , 2019, Machine Learning.

[16] Jason Weston,et al. Vicinal Risk Minimization , 2000, NIPS.

[17] Quoc V. Le,et al. QANet: Combining Local Convolution with Global Self-Attention for Reading Comprehension , 2018, ICLR.

[18] Paul H. J. Kelly,et al. Performance prediction of paging workloads using lightweight tracing , 2006, Future Gener. Comput. Syst..

[19] Luke S. Zettlemoyer,et al. Deep Contextualized Word Representations , 2018, NAACL.

[20] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .

[21] Jaakko Lehtinen,et al. Progressive Growing of GANs for Improved Quality, Stability, and Variation , 2017, ICLR.

[22] Noam M. Shazeer,et al. Corpora Generation for Grammatical Error Correction , 2019, NAACL.

[23] Quoc V. Le,et al. Randaugment: Practical automated data augmentation with a reduced search space , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[24] David Berthelot,et al. ReMixMatch: Semi-Supervised Learning with Distribution Alignment and Augmentation Anchoring , 2019, ArXiv.

[25] Roland Vollgraf,et al. Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms , 2017, ArXiv.

[26] Dong-Hyun Lee,et al. Pseudo-Label : The Simple and Efficient Semi-Supervised Learning Method for Deep Neural Networks , 2013 .

[27] Quoc V. Le,et al. Rethinking Pre-training and Self-training , 2020, NeurIPS.

[28] Yelong Shen,et al. A Simple but Tough-to-Beat Data Augmentation Approach for Natural Language Understanding and Generation , 2020, ArXiv.

[29] Sepp Hochreiter,et al. GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.

[30] Kaisheng Ma,et al. Be Your Own Teacher: Improve the Performance of Convolutional Neural Networks via Self Distillation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[31] Geoffrey E. Hinton,et al. Big Self-Supervised Models are Strong Semi-Supervised Learners , 2020, NeurIPS.

[32] Suman V. Ravuri,et al. Classification Accuracy Score for Conditional Generative Models , 2019, NeurIPS.

[33] Thomas Wolf,et al. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter , 2019, ArXiv.

[34] Tengyu Ma,et al. Understanding Self-Training for Gradual Domain Adaptation , 2020, ICML.

[35] Vladimir Vapnik,et al. Principles of Risk Minimization for Learning Theory , 1991, NIPS.

[36] Yang Song,et al. Generative Modeling by Estimating Gradients of the Data Distribution , 2019, NeurIPS.

[37] G. McLachlan,et al. Updating a discriminant function in basis of unclassified data , 1982 .

[38] Mark Chen,et al. Generative Pretraining From Pixels , 2020, ICML.

[39] Amos J. Storkey,et al. Data Augmentation Generative Adversarial Networks , 2017, ICLR 2018.

[40] Sam Shleifer. Low Resource Text Classification with ULMFit and Backtranslation , 2019, ArXiv.

[41] Fan Yang,et al. Good Semi-supervised Learning That Requires a Bad GAN , 2017, NIPS.

[42] Hongyi Zhang,et al. mixup: Beyond Empirical Risk Minimization , 2017, ICLR.

[43] Michael Collins,et al. Synthetic QA Corpora Generation with Roundtrip Consistency , 2019, ACL.

[44] Sven Gowal,et al. Improving Robustness using Generated Data , 2021, NeurIPS.

[45] John Duchi,et al. Understanding and Mitigating the Tradeoff Between Robustness and Accuracy , 2020, ICML.

[46] H. J. Scudder,et al. Probability of error of some adaptive pattern-recognition machines , 1965, IEEE Trans. Inf. Theory.

[47] Geoffrey E. Hinton,et al. A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.

[48] David Berthelot,et al. FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence , 2020, NeurIPS.

[49] Tianqi Chen,et al. XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[50] Hossein Mobahi,et al. Self-Distillation Amplifies Regularization in Hilbert Space , 2020, NeurIPS.

[51] Quoc V. Le,et al. ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators , 2020, ICLR.

[52] Ludwig Schmidt,et al. Unlabeled Data Improves Adversarial Robustness , 2019, NeurIPS.

[53] Samet Oymak,et al. Statistical and Algorithmic Insights for Semi-supervised Learning with Self-training , 2020, ArXiv.

[54] Martial Hebert,et al. Semi-Supervised Self-Training of Object Detection Models , 2005, 2005 Seventh IEEE Workshops on Applications of Computer Vision (WACV/MOTION'05) - Volume 1.

[55] Rico Sennrich,et al. Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[56] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[57] Kai Zou,et al. EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks , 2019, EMNLP.

[58] Danqi Chen,et al. Making Pre-trained Language Models Better Few-shot Learners , 2021, ACL/IJCNLP.

[59] Felix Gräßer,et al. Aspect-Based Sentiment Analysis of Drug Reviews Applying Cross-Domain and Cross-Data Learning , 2018, DH.

[60] Nikos Komodakis,et al. Wide Residual Networks , 2016, BMVC.

[61] Rui Zhang,et al. KDGAN: Knowledge Distillation with Generative Adversarial Networks , 2018, NeurIPS.

[62] Ellen Riloff,et al. Automatically Generating Extraction Patterns from Untagged Text , 1996, AAAI/IAAI, Vol. 2.

[63] Timo Aila,et al. A Style-Based Generator Architecture for Generative Adversarial Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[64] Colin Wei,et al. Theoretical Analysis of Self-Training with Deep Networks on Unlabeled Data , 2020, ICLR.

[65] Colin Raffel,et al. Realistic Evaluation of Deep Semi-Supervised Learning Algorithms , 2018, NeurIPS.

[66] Jianfeng Gao,et al. DeBERTa: Decoding-enhanced BERT with Disentangled Attention , 2020, ICLR.