论文信息 - Small Models are Valuable Plug-ins for Large Language Models - 字舞流文

Small Models are Valuable Plug-ins for Large Language Models

Large language models (LLMs) such as GPT-3 and GPT-4 are powerful but their weights are often publicly unavailable and their immense sizes make the models difficult to be tuned with common hardware. As a result, effectively tuning these models with large-scale supervised data can be challenging. As an alternative, In-Context Learning (ICL) can only use a small number of supervised examples due to context length limits. In this paper, we propose Super In-Context Learning (SuperICL) which allows black-box LLMs to work with locally fine-tuned smaller models, resulting in superior performance on supervised tasks. Our experiments demonstrate that SuperICL can improve performance beyond state-of-the-art fine-tuned models while addressing the instability problem of in-context learning. Furthermore, SuperICL can enhance the capabilities of smaller models, such as multilinguality and interpretability.

Julian McAuley | Chenguang Zhu | Yichong Xu | Canwen Xu | Yang Liu | Shuo Wang

[1] Amir Pouran Ben Veyseh,et al. ChatGPT Beyond English: Towards a Comprehensive Evaluation of Large Language Models in Multilingual Learning , 2023, EMNLP.

[2] Xu Tan,et al. HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in HuggingFace , 2023, ArXiv.

[3] Chenfei Wu,et al. Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models , 2023, ArXiv.

[4] Jiacheng Ye,et al. OpenICL: An Open-Source Framework for In-context Learning , 2023, ACL.

[5] Lingpeng Kong,et al. Compositional Exemplars for In-context Learning , 2023, ArXiv.

[6] Luke Zettlemoyer,et al. Toolformer: Language Models Can Teach Themselves to Use Tools , 2023, NeurIPS.

[7] Naman Goyal,et al. XLM-V: Overcoming the Vocabulary Bottleneck in Multilingual Masked Language Models , 2023, ArXiv.

[8] A. Zhmoginov,et al. Transformers learn in-context by gradient descent , 2022, ArXiv.

[9] Jonathan Berant,et al. Diverse Demonstrations Improve In-context Compositional Generalization , 2022, ArXiv.

[10] M. Lewis,et al. In-context Examples Selection for Machine Translation , 2022, ACL.

[11] D. Schuurmans,et al. What learning algorithm is in-context learning? Investigations with linear models , 2022, ICLR.

[12] Yiming Zhang,et al. Active Example Selection for In-Context Learning , 2022, EMNLP.

[13] Lingpeng Kong,et al. ProGen: Progressive Zero-shot Dataset Generation via In-context Feedback , 2022, EMNLP.

[14] Minjoon Seo,et al. Can Large Language Models Truly Understand Prompts? A Case Study with Negated Prompts , 2022, TL4NLP.

[15] Noah A. Smith,et al. Selective Annotation Makes Language Models Better Few-Shot Learners , 2022, ICLR.

[16] Kang Min Yoo,et al. Ground-Truth Labels Matter: A Deeper Look into Input-Label Demonstrations , 2022, EMNLP.

[17] Jingfei Du,et al. Improving In-Context Few-Shot Learning via Self-Supervised Training , 2022, NAACL.

[18] M. Lewis,et al. Rethinking the Role of Demonstrations: What Makes In-Context Learning Work? , 2022, Conference on Empirical Methods in Natural Language Processing.

[19] Sang Michael Xie,et al. An Explanation of In-context Learning as Implicit Bayesian Inference , 2021, ICLR.

[20] M. Lewis,et al. MetaICL: Learning to Learn In Context , 2021, NAACL.

[21] S. Riedel,et al. Fantastically Ordered Prompts and Where to Find Them: Overcoming Few-Shot Prompt Order Sensitivity , 2021, ACL.

[22] Weizhu Chen,et al. What Makes Good In-Context Examples for GPT-3? , 2021, DEELIO.

[23] Li Dong,et al. Why Can GPT Learn In-Context? Language Models Secretly Perform Gradient Descent as Meta-Optimizers , 2023, ACL.

[24] Lingpeng Kong,et al. Self-adaptive In-context Learning , 2022, ArXiv.

[25] Xi Victoria Lin,et al. Few-shot Learning with Multilingual Generative Language Models , 2021, EMNLP.

[26] Weizhu Chen,et al. DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing , 2021, ICLR.

[27] R Devon Hjelm,et al. Understanding by Understanding Not: Modeling Negation in Language Models , 2021, NAACL.

[28] D. Klein,et al. Calibrate Before Use: Improving Few-Shot Performance of Language Models , 2021, ICML.

[29] Danqi Chen,et al. Making Pre-trained Language Models Better Few-shot Learners , 2021, ACL.

[30] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.

[31] J. Weston,et al. Adversarial NLI: A New Benchmark for Natural Language Understanding , 2019, ACL.

[32] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[33] Omer Levy,et al. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.

[34] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.