Progressive-Hint Prompting Improves Reasoning in Large Language Models

The performance of Large Language Models (LLMs) in reasoning tasks depends heavily on prompt design, with Chain-of-Thought (CoT) and self-consistency being critical methods that enhance this ability. However, these methods do not fully exploit the answers generated by the LLM to guide subsequent responses. This paper proposes a new prompting method, named Progressive-Hint Prompting (PHP), that enables automatic multiple interactions between users and LLMs by using previously generated answers as hints to progressively guide toward the correct answers. PHP is orthogonal to CoT and self-consistency, making it easy to combine with state-of-the-art techniques to further improve performance. We conducted extensive and comprehensive experiments on seven benchmarks. The results show that PHP significantly improves accuracy while remaining highly efficient. For instance, with text-davinci-003, we observed a 4.2% improvement on GSM8K with greedy decoding compared to Complex CoT, and a 46.17% reduction in sample paths with self-consistency. With GPT-4 and PHP, we achieve state-of-the-art performances on SVAMP (89.1% ->91.9%), GSM8K (92% ->95.5%), AQuA (76.4% ->79.9%) and MATH (50.3% ->53.9%).

[1]  Michael Ruogu Zhang,et al.  Boosted Prompt Ensembles for Large Language Models , 2023, ArXiv.

[2]  Naman Goyal,et al.  LLaMA: Open and Efficient Foundation Language Models , 2023, ArXiv.

[3]  Tong Zhang,et al.  Active Prompting with Chain-of-Thought for Large Language Models , 2023, ArXiv.

[4]  Thomas Lukasiewicz,et al.  Mathematical Capabilities of ChatGPT , 2023, ArXiv.

[5]  Hiroaki Hayashi,et al.  Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing , 2021, ACM Comput. Surv..

[6]  William W. Cohen,et al.  Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks , 2022, ArXiv.

[7]  Yuhuai Wu,et al.  Draft, Sketch, and Prove: Guiding Formal Theorem Provers with Informal Proofs , 2022, ICLR.

[8]  Alexander J. Smola,et al.  Automatic Chain of Thought Prompting in Large Language Models , 2022, ICLR.

[9]  Hyung Won Chung,et al.  Language Models are Multilingual Chain-of-Thought Reasoners , 2022, ICLR.

[10]  Ashish Sabharwal,et al.  Complexity-Based Prompting for Multi-Step Reasoning , 2022, ICLR.

[11]  Noah A. Smith,et al.  Selective Annotation Makes Language Models Better Few-Shot Learners , 2022, ICLR.

[12]  D. Schuurmans,et al.  Rationale-Augmented Ensembles in Language Models , 2022, ArXiv.

[13]  Yuhuai Wu,et al.  Solving Quantitative Reasoning Problems with Language Models , 2022, NeurIPS.

[14]  Gerard de Melo,et al.  Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models , 2022, ArXiv.

[15]  Weizhu Chen,et al.  On the Advance of Making Language Models Better Reasoners , 2022, ArXiv.

[16]  Markus N. Rabe,et al.  Autoformalization with Large Language Models , 2022, NeurIPS.

[17]  S. Gu,et al.  Large Language Models are Zero-Shot Reasoners , 2022, NeurIPS.

[18]  Guillaume Lample,et al.  HyperTree Proof Search for Neural Theorem Proving , 2022, NeurIPS.

[19]  Yuhuai Wu,et al.  Thor: Wielding Hammers to Integrate Language Models and Automated Theorem Provers , 2022, NeurIPS.

[20]  D. Schuurmans,et al.  Least-to-Most Prompting Enables Complex Reasoning in Large Language Models , 2022, ICLR.

[21]  Tom B. Brown,et al.  Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback , 2022, ArXiv.

[22]  Andrew M. Dai,et al.  PaLM: Scaling Language Modeling with Pathways , 2022, J. Mach. Learn. Res..

[23]  D. Schuurmans,et al.  Self-Consistency Improves Chain of Thought Reasoning in Language Models , 2022, ICLR.

[24]  Lianhui Qin,et al.  Diversifying Content Generation for Commonsense Reasoning with Mixture of Knowledge Graph Experts , 2022, FINDINGS.

[25]  Ryan J. Lowe,et al.  Training language models to follow instructions with human feedback , 2022, NeurIPS.

[26]  Jesse Michael Han,et al.  Formal Mathematics Statement Curriculum Learning , 2022, ICLR.

[27]  Dale Schuurmans,et al.  Chain of Thought Prompting Elicits Reasoning in Large Language Models , 2022, NeurIPS.

[28]  Renelito Delos Santos,et al.  LaMDA: Language Models for Dialog Applications , 2022, ArXiv.

[29]  Jonathan Berant,et al.  Learning To Retrieve Prompts for In-Context Learning , 2021, NAACL.

[30]  Ashish Sabharwal,et al.  Pushing the Limits of Rule Reasoning in Transformers through Natural Language Satisfiability , 2021, AAAI.

[31]  Sang Michael Xie,et al.  An Explanation of In-context Learning as Implicit Bayesian Inference , 2021, ICLR.

[32]  S. Riedel,et al.  Fantastically Ordered Prompts and Where to Find Them: Overcoming Few-Shot Prompt Order Sensitivity , 2021, ACL.

[33]  Weizhu Chen,et al.  What Makes Good In-Context Examples for GPT-3? , 2021, DEELIO.

[34]  Po-Sen Huang,et al.  Scaling Language Models: Methods, Analysis & Insights from Training Gopher , 2021, ArXiv.

[35]  Mohammad Bavarian,et al.  Training Verifiers to Solve Math Word Problems , 2021, ArXiv.

[36]  Wai Lam,et al.  Exploiting Reasoning Chains for Multi-hop Science Question Answering , 2021, EMNLP.

[37]  Wojciech Zaremba,et al.  Evaluating Large Language Models Trained on Code , 2021, ArXiv.

[38]  Sang Michael Xie,et al.  Why Do Pretrained Language Models Help in Downstream Tasks? An Analysis of Head and Prompt Tuning , 2021, NeurIPS.

[39]  Dongyan Zhao,et al.  Why Machine Reading Comprehension Models Learn Shortcuts? , 2021, FINDINGS.

[40]  Navin Goyal,et al.  Are NLP Models really able to Solve Simple Math Word Problems? , 2021, NAACL.

[41]  Dawn Song,et al.  Measuring Mathematical Problem Solving With the MATH Dataset , 2021, NeurIPS Datasets and Benchmarks.

[42]  D. Klein,et al.  Calibrate Before Use: Improving Few-Shot Performance of Language Models , 2021, ICML.

[43]  E. Hovy,et al.  Measuring and Improving Consistency in Pretrained Language Models , 2021, Transactions of the Association for Computational Linguistics.

[44]  Jugal K. Kalita,et al.  A Survey of the Usages of Deep Learning for Natural Language Processing , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[45]  Percy Liang,et al.  Prefix-Tuning: Optimizing Continuous Prompts for Generation , 2021, ACL.

[46]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[47]  Xipeng Qiu,et al.  Pre-trained models for natural language processing: A survey , 2020, Science China Technological Sciences.

[48]  Quoc V. Le,et al.  Towards a Human-like Open-Domain Chatbot , 2020, ArXiv.

[49]  Alec Radford,et al.  Scaling Laws for Neural Language Models , 2020, ArXiv.

[50]  R. Socher,et al.  Learning to Retrieve Reasoning Paths over Wikipedia Graph for Question Answering , 2019, ICLR.

[51]  Jifan Chen,et al.  Multi-hop Question Answering via Reasoning Chains , 2019, ArXiv.

[52]  Iryna Gurevych,et al.  Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks , 2019, EMNLP.

[53]  Mirella Lapata,et al.  Learning an Executable Neural Semantic Parser , 2017, CL.

[54]  Kentaro Inui,et al.  What Makes Reading Comprehension Questions Easier? , 2018, EMNLP.

[55]  Wang Ling,et al.  Program Induction by Rationale Generation: Learning to Solve and Explain Algebraic Word Problems , 2017, ACL.

[56]  Dan Roth,et al.  Solving General Arithmetic Word Problems , 2016, EMNLP.

[57]  Oren Etzioni,et al.  Parsing Algebraic Word Problems into Equations , 2015, TACL.

[58]  Oren Etzioni,et al.  Learning to Solve Arithmetic Word Problems with Verb Categorization , 2014, EMNLP.

[59]  Geoffrey E. Hinton,et al.  A Learning Algorithm for Boltzmann Machines , 1985, Cogn. Sci..

[60]  Barbara J. Grosz,et al.  Natural-Language Processing , 1982, Artif. Intell..