A weakly supervised textual entailment approach to zero-shot text classification

Zero-shot text classification is a widely studied task that deals with a lack of annotated data. The most common approach is to reformulate it as a textual entailment problem, enabling classification into unseen classes. This work explores an effective approach that trains on a weakly supervised dataset generated from traditional classification data. We empirically study the relation between the performance of the entailment task, which is used as a proxy, and the target zero-shot text classification task. Our findings reveal that there is no linear correlation between both tasks, to the extent that it can be detrimental to lengthen the fine-tuning process even when the model is still learning, and propose a straightforward method to stop training on time. As a proof of concept, we introduce a domain-specific zero-shot text classifier that was trained on Microsoft Academic Graph data. The model, called SCIroShot, achieves state-of-the-art performance in the scientific domain and competitive results in other areas. Both the model and evaluation benchmark are publicly available on HuggingFace and GitHub.

[1]  Shuohang Wang,et al.  Want To Reduce Labeling Cost? GPT-3 Can Help , 2021, EMNLP.

[2]  Madian Khabsa,et al.  Entailment as Few-Shot Learner , 2021, ArXiv.

[3]  David R. So,et al.  Carbon Emissions and Large Neural Network Training , 2021, ArXiv.

[4]  Lutz Bornmann,et al.  Growth rates of modern science: a latent piecewise growth curve approach to model publication numbers from established and new literature databases , 2020, Humanities and Social Sciences Communications.

[5]  Rob Koeling,et al.  Elsevier OA CC-By Corpus , 2020, ArXiv.

[6]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[7]  Daniel S. Weld,et al.  SPECTER: Document-level Representation Learning using Citation-informed Transformers , 2020, ACL.

[8]  Dirk Hovy,et al.  What the [MASK]? Making Sense of Language-Specific BERT Models , 2020, ArXiv.

[9]  Yuxiao Dong,et al.  Microsoft Academic Graph: When experts are not enough , 2020, Quantitative Science Studies.

[10]  Timo Schick,et al.  Exploiting Cloze-Questions for Few-Shot Text Classification and Natural Language Inference , 2020, EACL.

[11]  Omer Levy,et al.  BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension , 2019, ACL.

[12]  Lysandre Debut,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[13]  Dan Roth,et al.  Benchmarking Zero-shot Text Classification: Datasets, Evaluation and Entailment Approach , 2019, EMNLP.

[14]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[15]  Andrew McCallum,et al.  Energy and Policy Considerations for Deep Learning in NLP , 2019, ACL.

[16]  Hao Wu,et al.  Long Document Classification From Local Word Glimpses via Recurrent Attention Learning , 2019, IEEE Access.

[17]  Jaewoo Kang,et al.  BioBERT: a pre-trained biomedical language representation model for biomedical text mining , 2019, Bioinform..

[18]  Roman Klinger,et al.  An Analysis of Annotated Corpora for Emotion Classification in Text , 2018, COLING.

[19]  Hao Ma,et al.  A Web-scale system for scientific knowledge exploration , 2018, ACL.

[20]  Sebastian Ruder,et al.  Universal Language Model Fine-tuning for Text Classification , 2018, ACL.

[21]  Christoph H. Lampert,et al.  Zero-Shot Learning—A Comprehensive Evaluation of the Good, the Bad and the Ugly , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Omer Levy,et al.  Zero-Shot Relation Extraction via Reading Comprehension , 2017, CoNLL.

[23]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[24]  Samuel R. Bowman,et al.  A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference , 2017, NAACL.

[25]  Xiang Zhang,et al.  Character-level Convolutional Networks for Text Classification , 2015, NIPS.

[26]  Yang Song,et al.  An Overview of Microsoft Academic Service (MAS) and Applications , 2015, WWW.

[27]  Dan Roth,et al.  On Dataless Hierarchical Text Classification , 2014, AAAI.

[28]  Peder Olesen Larsen,et al.  The rate of growth in scientific publication and the decline in coverage provided by Science Citation Index , 2010, Scientometrics.

[29]  Christoph H. Lampert,et al.  Learning to detect unseen object classes by between-class attribute transfer , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  Ming-Wei Chang,et al.  Importance of Semantic Representation: Dataless Classification , 2008, AAAI.

[31]  Melvil Dewey,et al.  A Classification and Subject Index for Cataloguing and Arranging the Books and Pamphlets of a Library , 2006 .

[32]  F B ROGERS,et al.  Medical Subject Headings , 1948, Nature.

[33]  Tingting Ma,et al.  Issues with Entailment-based Zero-shot Text Classification , 2021, ACL.

[34]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[35]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[36]  Andreas Vlachos,et al.  Zero-shot Relation Classification as Textual Entailment , 2018, FEVER@EMNLP.

[37]  Alec Radford,et al.  Improving Language Understanding by Generative Pre-Training , 2018 .

[38]  Danqi Chen,et al.  of the Association for Computational Linguistics: , 2001 .