Small Models are Valuable Plug-ins for Large Language Models

Large language models (LLMs) such as GPT-3 and GPT-4 are powerful but their weights are often publicly unavailable and their immense sizes make the models difficult to be tuned with common hardware. As a result, effectively tuning these models with large-scale supervised data can be challenging. As an alternative, In-Context Learning (ICL) can only use a small number of supervised examples due to context length limits. In this paper, we propose Super In-Context Learning (SuperICL) which allows black-box LLMs to work with locally fine-tuned smaller models, resulting in superior performance on supervised tasks. Our experiments demonstrate that SuperICL can improve performance beyond state-of-the-art fine-tuned models while addressing the instability problem of in-context learning. Furthermore, SuperICL can enhance the capabilities of smaller models, such as multilinguality and interpretability.

[1]  Amir Pouran Ben Veyseh,et al.  ChatGPT Beyond English: Towards a Comprehensive Evaluation of Large Language Models in Multilingual Learning , 2023, EMNLP.

[2]  Xu Tan,et al.  HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in HuggingFace , 2023, ArXiv.

[3]  Chenfei Wu,et al.  Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models , 2023, ArXiv.

[4]  Jiacheng Ye,et al.  OpenICL: An Open-Source Framework for In-context Learning , 2023, ACL.

[5]  Lingpeng Kong,et al.  Compositional Exemplars for In-context Learning , 2023, ArXiv.

[6]  Luke Zettlemoyer,et al.  Toolformer: Language Models Can Teach Themselves to Use Tools , 2023, NeurIPS.

[7]  Naman Goyal,et al.  XLM-V: Overcoming the Vocabulary Bottleneck in Multilingual Masked Language Models , 2023, ArXiv.

[8]  A. Zhmoginov,et al.  Transformers learn in-context by gradient descent , 2022, ArXiv.

[9]  Jonathan Berant,et al.  Diverse Demonstrations Improve In-context Compositional Generalization , 2022, ArXiv.

[10]  M. Lewis,et al.  In-context Examples Selection for Machine Translation , 2022, ACL.

[11]  D. Schuurmans,et al.  What learning algorithm is in-context learning? Investigations with linear models , 2022, ICLR.

[12]  Yiming Zhang,et al.  Active Example Selection for In-Context Learning , 2022, EMNLP.

[13]  Lingpeng Kong,et al.  ProGen: Progressive Zero-shot Dataset Generation via In-context Feedback , 2022, EMNLP.

[14]  Minjoon Seo,et al.  Can Large Language Models Truly Understand Prompts? A Case Study with Negated Prompts , 2022, TL4NLP.

[15]  Noah A. Smith,et al.  Selective Annotation Makes Language Models Better Few-Shot Learners , 2022, ICLR.

[16]  Kang Min Yoo,et al.  Ground-Truth Labels Matter: A Deeper Look into Input-Label Demonstrations , 2022, EMNLP.

[17]  Jingfei Du,et al.  Improving In-Context Few-Shot Learning via Self-Supervised Training , 2022, NAACL.

[18]  M. Lewis,et al.  Rethinking the Role of Demonstrations: What Makes In-Context Learning Work? , 2022, Conference on Empirical Methods in Natural Language Processing.

[19]  Sang Michael Xie,et al.  An Explanation of In-context Learning as Implicit Bayesian Inference , 2021, ICLR.

[20]  M. Lewis,et al.  MetaICL: Learning to Learn In Context , 2021, NAACL.

[21]  S. Riedel,et al.  Fantastically Ordered Prompts and Where to Find Them: Overcoming Few-Shot Prompt Order Sensitivity , 2021, ACL.

[22]  Weizhu Chen,et al.  What Makes Good In-Context Examples for GPT-3? , 2021, DEELIO.

[23]  Li Dong,et al.  Why Can GPT Learn In-Context? Language Models Secretly Perform Gradient Descent as Meta-Optimizers , 2023, ACL.

[24]  Lingpeng Kong,et al.  Self-adaptive In-context Learning , 2022, ArXiv.

[25]  Xi Victoria Lin,et al.  Few-shot Learning with Multilingual Generative Language Models , 2021, EMNLP.

[26]  Weizhu Chen,et al.  DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing , 2021, ICLR.

[27]  R Devon Hjelm,et al.  Understanding by Understanding Not: Modeling Negation in Language Models , 2021, NAACL.

[28]  D. Klein,et al.  Calibrate Before Use: Improving Few-Shot Performance of Language Models , 2021, ICML.

[29]  Danqi Chen,et al.  Making Pre-trained Language Models Better Few-shot Learners , 2021, ACL.

[30]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[31]  J. Weston,et al.  Adversarial NLI: A New Benchmark for Natural Language Understanding , 2019, ACL.

[32]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[33]  Omer Levy,et al.  GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.

[34]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.