Exploiting Cloze Questions for Few-Shot Text Classification and Natural Language Inference

Some NLP tasks can be solved in a fully unsupervised fashion by providing a pretrained language model with "task descriptions" in natural language (e.g., Radford et al., 2019). While this approach underperforms its supervised counterpart, we show in this work that the two ideas can be combined: We introduce Pattern-Exploiting Training (PET), a semi-supervised training procedure that reformulates input examples as cloze-style phrases to help language models understand a given task. These phrases are then used to assign soft labels to a large set of unlabeled examples. Finally, regular supervised training is performed on the resulting training set. For several tasks and languages, PET outperforms both supervised training and unsupervised approaches in low-resource settings by a large margin.

[1]  Jonathan Berant,et al.  oLMpics-On What Language Model Pre-training Captures , 2019, Transactions of the Association for Computational Linguistics.

[2]  Xiang Zhang,et al.  Character-level Convolutional Networks for Text Classification , 2015, NIPS.

[3]  Graham Neubig,et al.  How Can We Know What Language Models Know? , 2019, Transactions of the Association for Computational Linguistics.

[4]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[5]  Bryan Catanzaro,et al.  Zero-shot Text Classification With Generative Language Models , 2019, ArXiv.

[6]  Sebastian Ruder,et al.  Universal Language Model Fine-tuning for Text Classification , 2018, ACL.

[7]  Yue Zhang,et al.  Does it Make Sense? And Why? A Pilot Study for Sense Making and Explanation , 2019, ACL.

[8]  Colin Raffel,et al.  Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..

[9]  Yiming Yang,et al.  XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.

[10]  Allyson Ettinger What BERT Is Not: Lessons from a New Suite of Psycholinguistic Diagnostics for Language Models , 2019, Transactions of the Association for Computational Linguistics.

[11]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[12]  Richard Socher,et al.  The Natural Language Decathlon: Multitask Learning as Question Answering , 2018, ArXiv.

[13]  Quoc V. Le,et al.  A Simple Method for Commonsense Reasoning , 2018, ArXiv.

[14]  Hector J. Levesque,et al.  The Winograd Schema Challenge , 2011, AAAI Spring Symposium: Logical Formalizations of Commonsense Reasoning.

[15]  Sebastian Riedel,et al.  Language Models as Knowledge Bases? , 2019, EMNLP.

[16]  Alexander M. Rush,et al.  Commonsense Knowledge Mining from Pretrained Models , 2019, EMNLP.

[17]  Yejin Choi,et al.  WINOGRANDE: An Adversarial Winograd Schema Challenge at Scale , 2020, AAAI.

[18]  Rico Sennrich,et al.  X -stance: A Multilingual Multi-Target Dataset for Stance Detection , 2020, SwissText/KONVENS.

[19]  Veselin Stoyanov,et al.  Unsupervised Cross-lingual Representation Learning at Scale , 2019, ACL.

[20]  Steven Schockaert,et al.  Inducing Relational Knowledge from BERT , 2019, AAAI.

[21]  Alexandros Potamianos,et al.  An Embarrassingly Simple Approach for Transfer Learning from Pretrained Language Models , 2019, NAACL.

[22]  Yejin Choi,et al.  Unsupervised Commonsense Question Answering with Self-Talk , 2020, EMNLP.

[23]  Hinrich Schütze,et al.  Rare Words: A Major Problem for Contextualized Embeddings And How to Fix it by Attentive Mimicking , 2019, AAAI.

[24]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[25]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[26]  Samuel R. Bowman,et al.  A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference , 2017, NAACL.