论文信息 - Self-Supervised Meta-Learning for Few-Shot Natural Language Classification Tasks - 字舞流文

Self-Supervised Meta-Learning for Few-Shot Natural Language Classification Tasks

Self-supervised pre-training of transformer models has revolutionized NLP applications. Such pre-training with language modeling objectives provides a useful initial point for parameters that generalize well to new tasks with fine-tuning. However, fine-tuning is still data inefficient -- when there are few labeled examples, accuracy can be low. Data efficiency can be improved by optimizing pre-training directly for future fine-tuning with few examples; this can be treated as a meta-learning problem. However, standard meta-learning techniques require many training tasks in order to generalize; unfortunately, finding a diverse set of such supervised tasks is usually difficult. This paper proposes a self-supervised approach to generate a large, rich, meta-learning task distribution from unlabeled text. This is achieved using a cloze-style objective, but creating separate multi-class classification tasks by gathering tokens-to-be blanked from among only a handful of vocabulary terms. This yields as many unique meta-training tasks as the number of subsets of vocabulary terms. We meta-train a transformer model on this distribution of tasks using a recent meta-learning framework. On 17 NLP tasks, we show that this meta-training leads to better few-shot generalization than language-model pre-training followed by finetuning. Furthermore, we show how the self-supervised tasks can be combined with supervised tasks for meta-learning, providing substantial accuracy gains over previous supervised meta-learning.

Trapit Bansal | Rishikesh Jha | Tsendsuren Munkhdalai | Andrew McCallum | A. McCallum | Tsendsuren Munkhdalai | Trapit Bansal | Rishikesh Jha

[1] Ming-Wei Chang,et al. Well-Read Students Learn Better: The Impact of Student Initialization on Knowledge Distillation , 2019, ArXiv.

[2] Samuel R. Bowman,et al. Sentence Encoders on STILTs: Supplementary Training on Intermediate Labeled-data Tasks , 2018, ArXiv.

[3] Yong Wang,et al. Meta-Learning for Low-Resource Neural Machine Translation , 2018, EMNLP.

[4] Erik F. Tjong Kim Sang,et al. Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition , 2003, CoNLL.

[5] Peter Clark,et al. SciTaiL: A Textual Entailment Dataset from Science Question Answering , 2018, AAAI.

[6] Hugo Larochelle,et al. Optimization as a Model for Few-Shot Learning , 2016, ICLR.

[7] Sergey Levine,et al. Unsupervised Learning via Meta-Learning , 2018, ICLR.

[8] Sergey Levine,et al. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[9] Helen Yannakoudakis,et al. Learning to Learn to Disambiguate: Meta-Learning for Few-Shot Word Sense Disambiguation , 2020, EMNLP.

[10] Richard S. Zemel,et al. Prototypical Networks for Few-shot Learning , 2017, NIPS.

[11] Oriol Vinyals,et al. Matching Networks for One Shot Learning , 2016, NIPS.

[12] Xiaodong Liu,et al. Multi-Task Deep Neural Networks for Natural Language Understanding , 2019, ACL.

[13] J. Schulman,et al. Reptile: a Scalable Metalearning Algorithm , 2018 .

[14] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[15] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[16] Yiming Yang,et al. XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.

[17] Yoshua Bengio,et al. On the Optimization of a Synaptic Learning Rule , 2007 .

[18] Hang Li,et al. Meta-SGD: Learning to Learn Quickly for Few Shot Learning , 2017, ArXiv.

[19] Rich Caruana,et al. Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.

[20] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .

[21] Boi Faltings,et al. Meta-Learning for Low-resource Natural Language Generation in Task-oriented Dialogue Systems , 2019, IJCAI.

[22] Hong Yu,et al. Meta Networks , 2017, ICML.

[23] Chelsea Finn,et al. Learning to Learn with Gradients , 2018 .

[24] Yu Cheng,et al. Diverse Few-Shot Text Classification with Multiple Metrics , 2018, NAACL.

[25] Razvan Pascanu,et al. Meta-Learning with Warped Gradient Descent , 2020, ICLR.

[26] Yoshua Bengio,et al. Why Does Unsupervised Pre-training Help Deep Learning? , 2010, AISTATS.

[27] Luke S. Zettlemoyer,et al. Deep Contextualized Word Representations , 2018, NAACL.

[28] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.

[29] Daan Wierstra,et al. Meta-Learning with Memory-Augmented Neural Networks , 2016, ICML.

[30] Jascha Sohl-Dickstein,et al. SVCCA: Singular Vector Canonical Correlation Analysis for Deep Learning Dynamics and Interpretability , 2017, NIPS.

[31] James R. Glass,et al. Asgard: A portable architecture for multilingual dialogue systems , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[32] Wilson L. Taylor,et al. “Cloze Procedure”: A New Tool for Measuring Readability , 1953 .

[33] Colin Raffel,et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..

[34] Lei Yu,et al. Learning and Evaluating General Linguistic Intelligence , 2019, ArXiv.

[35] C A Nelson,et al. Learning to Learn , 2017, Encyclopedia of Machine Learning and Data Mining.

[36] Jascha Sohl-Dickstein,et al. Learning Unsupervised Learning Rules , 2018, ArXiv.

[37] Andrew McCallum,et al. Learning to Few-Shot Learn Across Diverse Natural Language Classification Tasks , 2020, COLING.

[38] John Blitzer,et al. Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification , 2007, ACL.

[39] Razvan Pascanu,et al. Meta-Learning with Latent Embedding Optimization , 2018, ICLR.

[40] Sebastian Ruder,et al. Universal Language Model Fine-tuning for Text Classification , 2018, ACL.

[41] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[42] Christopher Potts,et al. A large annotated corpus for learning natural language inference , 2015, EMNLP.

[43] Omer Levy,et al. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.

[44] Regina Barzilay,et al. Multi-Source Domain Adaptation with Mixture of Experts , 2018, EMNLP.

[45] Zhiyuan Liu,et al. FewRel: A Large-Scale Supervised Few-Shot Relation Classification Dataset with State-of-the-Art Evaluation , 2018, EMNLP.