论文信息 - Exploiting Cloze Questions for Few-Shot Text Classification and Natural Language Inference

Exploiting Cloze Questions for Few-Shot Text Classification and Natural Language Inference

Some NLP tasks can be solved in a fully unsupervised fashion by providing a pretrained language model with "task descriptions" in natural language (e.g., Radford et al., 2019). While this approach underperforms its supervised counterpart, we show in this work that the two ideas can be combined: We introduce Pattern-Exploiting Training (PET), a semi-supervised training procedure that reformulates input examples as cloze-style phrases to help language models understand a given task. These phrases are then used to assign soft labels to a large set of unlabeled examples. Finally, regular supervised training is performed on the resulting training set. For several tasks and languages, PET outperforms both supervised training and unsupervised approaches in low-resource settings by a large margin.

Hinrich Schutze | Timo Schick

[1] Jonathan Berant,et al. oLMpics-On What Language Model Pre-training Captures , 2019, Transactions of the Association for Computational Linguistics.

[2] Xiang Zhang,et al. Character-level Convolutional Networks for Text Classification , 2015, NIPS.

[3] Graham Neubig,et al. How Can We Know What Language Models Know? , 2019, Transactions of the Association for Computational Linguistics.

[4] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[5] Bryan Catanzaro,et al. Zero-shot Text Classification With Generative Language Models , 2019, ArXiv.

[6] Sebastian Ruder,et al. Universal Language Model Fine-tuning for Text Classification , 2018, ACL.

[7] Yue Zhang,et al. Does it Make Sense? And Why? A Pilot Study for Sense Making and Explanation , 2019, ACL.

[8] Colin Raffel,et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..

[9] Yiming Yang,et al. XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.

[10] Allyson Ettinger. What BERT Is Not: Lessons from a New Suite of Psycholinguistic Diagnostics for Language Models , 2019, Transactions of the Association for Computational Linguistics.

[11] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.