Building Efficient Universal Classifiers with Natural Language Inference

Generative Large Language Models (LLMs) have become the mainstream choice for fewshot and zeroshot learning thanks to the universality of text generation. Many users, however, do not need the broad capabilities of generative LLMs when they only want to automate a classification task. Smaller BERT-like models can also learn universal tasks, which allow them to do any text classification task without requiring fine-tuning (zeroshot classification) or to learn new tasks with only a few examples (fewshot), while being significantly more efficient than generative LLMs. This paper (1) explains how Natural Language Inference (NLI) can be used as a universal classification task that follows similar principles as instruction fine-tuning of generative LLMs, (2) provides a step-by-step guide with reusable Jupyter notebooks for building a universal classifier, and (3) shares the resulting universal classifier that is trained on 33 datasets with 389 diverse classes. Parts of the code we share has been used to train our older zeroshot classifiers that have been downloaded more than 55 million times via the Hugging Face Hub as of December 2023. Our new classifier improves zeroshot performance by 9.4%.

[1]  K. Bollacker,et al.  The Data Provenance Initiative: A Large Scale Audit of Dataset Licensing & Attribution in AI , 2023, ArXiv.

[2]  Leandro von Werra,et al.  Zephyr: Direct Distillation of LM Alignment , 2023, ArXiv.

[3]  Eric Michael Smith,et al.  Llama 2: Open Foundation and Fine-Tuned Chat Models , 2023, ArXiv.

[4]  Wouter van Atteveldt,et al.  Less Annotating, More Classifying: Addressing the Data Scarcity Issue of Supervised Machine Learning with Deep Transfer Learning and BERT-NLI , 2023, Political Analysis.

[5]  Michiel de Jong,et al.  GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints , 2023, EMNLP.

[6]  Damien Sileo tasksource: A Dataset Harmonization Framework for Streamlined NLP Multi-Task Learning and Evaluation , 2023, 2301.05948.

[7]  Moritz Laurer,et al.  Lowering the Language Barrier - : Investigating Deep Transfer Learning and Machine Translation for Multilingual Analyses of Political Texts , 2023, Computational Communication Research.

[8]  Zhilin Yang,et al.  A Universal Discriminator for Zero-Shot Generalization , 2022, ACL.

[9]  Andrew M. Dai,et al.  Scaling Instruction-Finetuned Language Models , 2022, ArXiv.

[10]  Daniel Y. Fu,et al.  FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness , 2022, NeurIPS.

[11]  Ryan J. Lowe,et al.  Training language models to follow instructions with human feedback , 2022, NeurIPS.

[12]  Noah A. Smith,et al.  WANLI: Worker and AI Collaboration for Natural Language Inference Dataset Creation , 2022, EMNLP.

[13]  Weizhu Chen,et al.  DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing , 2021, ICLR.

[14]  Hangping Qiu,et al.  NSP-BERT: A Prompt-based Few-Shot Learner through an Original Pre-training Task —— Next Sentence Prediction , 2021, COLING.

[15]  Noah A. Smith,et al.  Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation , 2021, ICLR.

[16]  Kyle Lo,et al.  FLEX: Unifying Evaluation for Few-Shot NLP , 2021, NeurIPS.

[17]  Dragomir R. Radev,et al.  DocNLI: A Large-scale Dataset for Document-level Natural Language Inference , 2021, FINDINGS.

[18]  Madian Khabsa,et al.  Entailment as Few-Shot Learner , 2021, ArXiv.

[19]  Jianlin Su,et al.  RoFormer: Enhanced Transformer with Rotary Position Embedding , 2021, Neurocomputing.

[20]  Samuel R. Bowman,et al.  Does Putting a Linguist in the Loop Improve NLU Data Collection? , 2021, EMNLP.

[21]  Alexander M. Rush,et al.  How many data points is a prompt worth? , 2021, NAACL.

[22]  Dragomir R. Radev,et al.  Universal Natural Language Processing with Limited Annotations: Try Few-shot Textual Entailment as a Start , 2020, EMNLP.

[23]  Teven Le Scao,et al.  Transformers: State-of-the-Art Natural Language Processing , 2019, EMNLP.

[24]  Hinrich Schutze,et al.  It’s Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners , 2020, NAACL.

[25]  Timo Schick,et al.  Exploiting Cloze-Questions for Few-Shot Text Classification and Natural Language Inference , 2020, EACL.

[26]  J. Weston,et al.  Adversarial NLI: A New Benchmark for Natural Language Understanding , 2019, ACL.

[27]  Peter J. Liu,et al.  Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..

[28]  Dan Roth,et al.  Benchmarking Zero-shot Text Classification: Datasets, Evaluation and Entailment Approach , 2019, EMNLP.

[29]  Guillaume Lample,et al.  XNLI: Evaluating Cross-lingual Sentence Representations , 2018, EMNLP.

[30]  Samuel R. Bowman,et al.  A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference , 2017, NAACL.

[31]  Christopher Potts,et al.  A large annotated corpus for learning natural language inference , 2015, EMNLP.

[32]  Tingting Ma,et al.  Issues with Entailment-based Zero-shot Text Classification , 2021, ACL.

[33]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[34]  Ido Dagan,et al.  The PASCAL Recognising Textual Entailment Challenge , 2005, MLCW.

[35]  Miguel Ángel García Cumbreras,et al.  Association for Computational Linguistics , 2001 .