论文信息 - A weakly supervised textual entailment approach to zero-shot text classification - 字舞流文

A weakly supervised textual entailment approach to zero-shot text classification

Zero-shot text classification is a widely studied task that deals with a lack of annotated data. The most common approach is to reformulate it as a textual entailment problem, enabling classification into unseen classes. This work explores an effective approach that trains on a weakly supervised dataset generated from traditional classification data. We empirically study the relation between the performance of the entailment task, which is used as a proxy, and the target zero-shot text classification task. Our findings reveal that there is no linear correlation between both tasks, to the extent that it can be detrimental to lengthen the fine-tuning process even when the model is still learning, and propose a straightforward method to stop training on time. As a proof of concept, we introduce a domain-specific zero-shot text classifier that was trained on Microsoft Academic Graph data. The model, called SCIroShot, achieves state-of-the-art performance in the scientific domain and competitive results in other areas. Both the model and evaluation benchmark are publicly available on HuggingFace and GitHub.

F. Massucci | C. Parra-Rojas | A. Gonzalez-Agirre | Marc Pàmies | Nicolau Duran-Silva | Francesco Multari | Joan Llop | Marta Villegas

[1] Shuohang Wang,et al. Want To Reduce Labeling Cost? GPT-3 Can Help , 2021, EMNLP.

[2] Madian Khabsa,et al. Entailment as Few-Shot Learner , 2021, ArXiv.

[3] David R. So,et al. Carbon Emissions and Large Neural Network Training , 2021, ArXiv.

[4] Lutz Bornmann,et al. Growth rates of modern science: a latent piecewise growth curve approach to model publication numbers from established and new literature databases , 2020, Humanities and Social Sciences Communications.

[5] Rob Koeling,et al. Elsevier OA CC-By Corpus , 2020, ArXiv.

[6] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.

[7] Daniel S. Weld,et al. SPECTER: Document-level Representation Learning using Citation-informed Transformers , 2020, ACL.

[8] Dirk Hovy,et al. What the [MASK]? Making Sense of Language-Specific BERT Models , 2020, ArXiv.

[9] Yuxiao Dong,et al. Microsoft Academic Graph: When experts are not enough , 2020, Quantitative Science Studies.

[10] Timo Schick,et al. Exploiting Cloze-Questions for Few-Shot Text Classification and Natural Language Inference , 2020, EACL.

[11] Omer Levy,et al. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension , 2019, ACL.

[12] Lysandre Debut,et al. HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[13] Dan Roth,et al. Benchmarking Zero-shot Text Classification: Datasets, Evaluation and Entailment Approach , 2019, EMNLP.

[14] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[15] Andrew McCallum,et al. Energy and Policy Considerations for Deep Learning in NLP , 2019, ACL.

[16] Hao Wu,et al. Long Document Classification From Local Word Glimpses via Recurrent Attention Learning , 2019, IEEE Access.

[17] Jaewoo Kang,et al. BioBERT: a pre-trained biomedical language representation model for biomedical text mining , 2019, Bioinform..

[18] Roman Klinger,et al. An Analysis of Annotated Corpora for Emotion Classification in Text , 2018, COLING.

[19] Hao Ma,et al. A Web-scale system for scientific knowledge exploration , 2018, ACL.

[20] Sebastian Ruder,et al. Universal Language Model Fine-tuning for Text Classification , 2018, ACL.

[21] Christoph H. Lampert,et al. Zero-Shot Learning—A Comprehensive Evaluation of the Good, the Bad and the Ugly , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22] Omer Levy,et al. Zero-Shot Relation Extraction via Reading Comprehension , 2017, CoNLL.

[23] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[24] Samuel R. Bowman,et al. A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference , 2017, NAACL.

[25] Xiang Zhang,et al. Character-level Convolutional Networks for Text Classification , 2015, NIPS.

[26] Yang Song,et al. An Overview of Microsoft Academic Service (MAS) and Applications , 2015, WWW.

[27] Dan Roth,et al. On Dataless Hierarchical Text Classification , 2014, AAAI.

[28] Peder Olesen Larsen,et al. The rate of growth in scientific publication and the decline in coverage provided by Science Citation Index , 2010, Scientometrics.

[29] Christoph H. Lampert,et al. Learning to detect unseen object classes by between-class attribute transfer , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[30] Ming-Wei Chang,et al. Importance of Semantic Representation: Dataless Classification , 2008, AAAI.

[31] Melvil Dewey,et al. A Classification and Subject Index for Cataloguing and Arranging the Books and Pamphlets of a Library , 2006 .

[32] F B ROGERS,et al. Medical Subject Headings , 1948, Nature.

[33] Tingting Ma,et al. Issues with Entailment-based Zero-shot Text Classification , 2021, ACL.

[34] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[35] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .

[36] Andreas Vlachos,et al. Zero-shot Relation Classification as Textual Entailment , 2018, FEVER@EMNLP.

[37] Alec Radford,et al. Improving Language Understanding by Generative Pre-Training , 2018 .

[38] Danqi Chen,et al. of the Association for Computational Linguistics: , 2001 .