Least-to-Most Prompting Enables Complex Reasoning in Large Language Models

Chain-of-thought prompting has demonstrated remarkable performance on various natural language reasoning tasks. However, it tends to perform poorly on tasks which requires solving problems harder than the exemplars shown in the prompts. To overcome this challenge of easy-to-hard generalization, we propose a novel prompting strategy, least-to-most prompting. The key idea in this strategy is to break down a complex problem into a series of simpler subproblems and then solve them in sequence. Solving each subproblem is facilitated by the answers to previously solved subproblems. Our experimental results on tasks related to symbolic manipulation, compositional generalization, and math reasoning reveal that least-to-most prompting is capable of generalizing to more difficult problems than those seen in the prompts. A notable finding is that when the GPT-3 code-davinci-002 model is used with least-to-most prompting, it can solve the compositional generalization benchmark SCAN in any split (including length split) with an accuracy of at least 99% using just 14 exemplars, compared to only 16% accuracy with chain-of-thought prompting. This is particularly noteworthy because neural-symbolic models in the literature that specialize in solving SCAN are trained on the entire training set containing over 15,000 examples. We have included prompts for all the tasks in the Appendix.

[1]  Haoming Jiang,et al.  SeqZero: Few-shot Compositional Semantic Parsing with Sequential Prompts and Zero-shot Models , 2022, NAACL-HLT.

[2]  Andrew M. Dai,et al.  PaLM: Scaling Language Modeling with Pathways , 2022, J. Mach. Learn. Res..

[3]  D. Schuurmans,et al.  Self-Consistency Improves Chain of Thought Reasoning in Language Models , 2022, ICLR.

[4]  Dale Schuurmans,et al.  Chain of Thought Prompting Elicits Reasoning in Large Language Models , 2022, NeurIPS.

[5]  Mohammad Bavarian,et al.  Training Verifiers to Solve Math Word Problems , 2021, ArXiv.

[6]  Carrie J. Cai,et al.  AI Chains: Transparent and Controllable Human-AI Interaction by Chaining Large Language Model Prompts , 2021, CHI.

[7]  Yoon Kim,et al.  Sequence-to-Sequence Learning with Latent Neural Grammars , 2021, NeurIPS.

[8]  Uzi Vishkin,et al.  Can You Learn an Algorithm? Generalizing from Easy to Hard Problems with Recurrent Networks , 2021, NeurIPS.

[9]  Zeyi Huang,et al.  Not All Images are Worth 16x16 Words: Dynamic Transformers for Efficient Image Recognition , 2021, NeurIPS.

[10]  Brian Lester,et al.  The Power of Scale for Parameter-Efficient Prompt Tuning , 2021, EMNLP.

[11]  Jonathan Berant,et al.  Did Aristotle Use a Laptop? A Question Answering Benchmark with Implicit Reasoning Strategies , 2021, Transactions of the Association for Computational Linguistics.

[12]  Ming-Wei Chang,et al.  Compositional Generalization and Natural Language Variation: Can a Semantic Parsing Approach Handle Both? , 2020, ACL.

[13]  Jacob Andreas,et al.  Learning to Recombine and Resample Data for Compositional Generalization , 2020, ICLR.

[14]  Jonathan Berant,et al.  Span-based Semantic Parsing for Compositional Generalization , 2020, ACL.

[15]  Chen Liang,et al.  Compositional Generalization via Neural-Symbolic Stack Machines , 2020, NeurIPS.

[16]  Qian Liu,et al.  Compositional Generalization by Learning Analytical Expressions , 2020, NeurIPS.

[17]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[18]  Quoc V. Le,et al.  Neural Symbolic Reader: Scalable Integration of Distributed and Symbolic Representations for Reading Comprehension , 2020, ICLR.

[19]  David Lopez-Paz,et al.  Permutation Equivariant Models for Compositional Generalization in Language , 2020, ICLR.

[20]  Armando Solar-Lezama,et al.  Learning Compositional Rules via Neural Program Synthesis , 2020, NeurIPS.

[21]  Kyunghyun Cho,et al.  Unsupervised Question Decomposition for Question Answering , 2020, EMNLP.

[22]  Xiao Wang,et al.  Measuring Compositional Generalization: A Comprehensive Method on Realistic Data , 2019, ICLR.

[23]  Liang Zhao,et al.  Compositional Generalization for Primitive Substitutions , 2019, EMNLP.

[24]  Kenton Lee,et al.  Giving BERT a Calculator: Finding Operations and Arguments with Reading Comprehension , 2019, EMNLP.

[25]  Brenden M. Lake,et al.  Compositional generalization through meta sequence-to-sequence learning , 2019, NeurIPS.

[26]  Chong Wang,et al.  Neural Logic Machines , 2019, ICLR.

[27]  Yoshua Bengio,et al.  Compositional generalization in a deep seq2seq model by separating syntax and semantics , 2019, ArXiv.

[28]  Jacob Andreas,et al.  Good-Enough Compositional Data Augmentation , 2019, ACL.

[29]  Gabriel Stanovsky,et al.  DROP: A Reading Comprehension Benchmark Requiring Discrete Reasoning Over Paragraphs , 2019, NAACL.

[30]  Marco Baroni,et al.  Generalization without Systematicity: On the Compositional Skills of Sequence-to-Sequence Recurrent Networks , 2017, ICML.

[31]  Wang Ling,et al.  Program Induction by Rationale Generation: Learning to Solve and Explain Algebraic Word Problems , 2017, ACL.

[32]  Sergio Gomez Colmenarejo,et al.  Hybrid computing using a neural network with dynamic external memory , 2016, Nature.

[33]  Alex Graves,et al.  Adaptive Computation Time for Recurrent Neural Networks , 2016, ArXiv.

[34]  Marcin Andrychowicz,et al.  Neural Random Access Machines , 2015, ERCIM News.

[35]  Phil Blunsom,et al.  Learning to Transduce with Unbounded Memory , 2015, NIPS.

[36]  Tomas Mikolov,et al.  Inferring Algorithmic Patterns with Stack-Augmented Recurrent Nets , 2015, NIPS.

[37]  Huan Sun,et al.  Shepherd Pre-trained Language Models to Develop a Train of Thought: An Iterative Prompting Approach , 2022, ArXiv.

[38]  Stacie L. Bancroft,et al.  A Comparison of Most-to-Least and Least-to-Most Prompting on the Acquisition of Solitary Play Skills , 2008, Behavior analysis in practice.

[39]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.