Complementary Explanations for Effective In-Context Learning

Large language models (LLMs) have exhibited remarkable capabilities in learning from explanations in prompts, but there has been limited understanding of exactly how these explanations function or why they are effective. This work aims to better understand the mechanisms by which explanations are used for in-context learning. We first study the impact of two different factors on the performance of prompts with explanations: the computation trace (the way the solution is decomposed) and the natural language used to express the prompt. By perturbing explanations on three controlled tasks, we show that both factors contribute to the effectiveness of explanations. We further study how to form maximally effective sets of explanations for solving a given test query. We find that LLMs can benefit from the complementarity of the explanation set: diverse reasoning skills shown by different exemplars can lead to better performance. Therefore, we propose a maximal marginal relevance-based exemplar selection approach for constructing exemplar sets that are both relevant as well as complementary, which successfully improves the in-context learning performance across three real-world tasks on multiple LLMs.

[1]  H. Larochelle,et al.  Teaching Algorithmic Reasoning via In-context Learning , 2022, ArXiv.

[2]  Graham Neubig,et al.  Language Models of Code are Few-Shot Commonsense Learners , 2022, EMNLP.

[3]  Noah A. Smith,et al.  Measuring and Narrowing the Compositionality Gap in Language Models , 2022, EMNLP.

[4]  Noah A. Smith,et al.  Selective Annotation Makes Language Models Better Few-Shot Learners , 2022, ICLR.

[5]  D. Schuurmans,et al.  Rationale-Augmented Ensembles in Language Models , 2022, ArXiv.

[6]  Ronan Le Bras,et al.  Maieutic Prompting: Logically Consistent Reasoning with Recursive Explanations , 2022, EMNLP.

[7]  S. Gu,et al.  Large Language Models are Zero-Shot Reasoners , 2022, NeurIPS.

[8]  Kristina Toutanova,et al.  Evaluating the Impact of Model Scale for Compositional Generalization in Semantic Parsing , 2022, EMNLP.

[9]  D. Schuurmans,et al.  Least-to-Most Prompting Enables Complex Reasoning in Large Language Models , 2022, ICLR.

[10]  Li Dong,et al.  Prototypical Calibration for Few-shot Learning of Language Models , 2022, ICLR.

[11]  Greg Durrett,et al.  The Unreliability of Explanations in Few-shot Prompting for Textual Reasoning , 2022, NeurIPS.

[12]  Xi Victoria Lin,et al.  OPT: Open Pre-trained Transformer Language Models , 2022, ArXiv.

[13]  Andrew M. Dai,et al.  PaLM: Scaling Language Modeling with Pathways , 2022, J. Mach. Learn. Res..

[14]  D. Schuurmans,et al.  Self-Consistency Improves Chain of Thought Reasoning in Language Models , 2022, ICLR.

[15]  Ryan J. Lowe,et al.  Training language models to follow instructions with human feedback , 2022, NeurIPS.

[16]  M. Lewis,et al.  Rethinking the Role of Demonstrations: What Makes In-Context Learning Work? , 2022, EMNLP.

[17]  Alexander M. Rush,et al.  PromptSource: An Integrated Development Environment and Repository for Natural Language Prompts , 2022, ACL.

[18]  Dale Schuurmans,et al.  Chain of Thought Prompting Elicits Reasoning in Large Language Models , 2022, NeurIPS.

[19]  Jonathan Berant,et al.  Learning To Retrieve Prompts for In-Context Learning , 2021, NAACL.

[20]  David Bieber,et al.  Show Your Work: Scratchpads for Intermediate Computation with Language Models , 2021, ArXiv.

[21]  Sang Michael Xie,et al.  An Explanation of In-context Learning as Implicit Bayesian Inference , 2021, ICLR.

[22]  M. Lewis,et al.  MetaICL: Learning to Learn In Context , 2021, NAACL.

[23]  Mohammad Bavarian,et al.  Training Verifiers to Solve Math Word Problems , 2021, ArXiv.

[24]  G. Karypis,et al.  Meta-learning via Language Model In-context Tuning , 2021, ACL.

[25]  Luke Zettlemoyer,et al.  Noisy Channel Language Model Prompting for Few-Shot Text Classification , 2021, ACL.

[26]  Wojciech Zaremba,et al.  Evaluating Large Language Models Trained on Code , 2021, ArXiv.

[27]  S. Riedel,et al.  Fantastically Ordered Prompts and Where to Find Them: Overcoming Few-Shot Prompt Order Sensitivity , 2021, ACL.

[28]  Dan Klein,et al.  Constrained Language Models Yield Few-Shot Semantic Parsers , 2021, EMNLP.

[29]  Luke Zettlemoyer,et al.  Surface Form Competition: Why the Highest Probability Answer Isn’t Always Right , 2021, EMNLP.

[30]  D. Klein,et al.  Calibrate Before Use: Improving Few-Shot Performance of Language Models , 2021, ICML.

[31]  Weizhu Chen,et al.  What Makes Good In-Context Examples for GPT-3? , 2021, DEELIO.

[32]  Tom B. Brown,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[33]  Kilian Q. Weinberger,et al.  BERTScore: Evaluating Text Generation with BERT , 2019, ICLR.

[34]  Thomas Lukasiewicz,et al.  e-SNLI: Natural Language Inference with Natural Language Explanations , 2018, NeurIPS.

[35]  Dinesh Garg,et al.  Explanations for CommonsenseQA: New Dataset and Models , 2021, ACL.

[36]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[37]  Jade Goldstein-Stewart,et al.  The use of MMR, diversity-based reranking for reordering documents and producing summaries , 1998, SIGIR '98.