Boosting Language Models Reasoning with Chain-of-Knowledge Prompting

Recently, Chain-of-Thought (CoT) prompting has delivered success on complex reasoning tasks, which aims at designing a simple prompt like ``Let's think step by step'' or multiple in-context exemplars with well-designed rationales to elicit Large Language Models (LLMs) to generate intermediate reasoning steps. However, the generated rationales often come with mistakes, making unfactual and unfaithful reasoning chains. To mitigate this brittleness, we propose a novel Chain-of-Knowledge (CoK) prompting, where we aim at eliciting LLMs to generate explicit pieces of knowledge evidence in the form of structure triple. This is inspired by our human behaviors, i.e., we can draw a mind map or knowledge map as the reasoning evidence in the brain before answering a complex question. Benefiting from CoK, we additionally introduce a F^2-Verification method to estimate the reliability of the reasoning chains in terms of factuality and faithfulness. For the unreliable response, the wrong evidence can be indicated to prompt the LLM to rethink. Extensive experiments demonstrate that our method can further improve the performance of commonsense, factual, symbolic, and arithmetic reasoning tasks.

[1]  Xipeng Qiu,et al.  Do Large Language Models Know What They Don't Know? , 2023, ACL.

[2]  Danqi Chen,et al.  What In-Context Learning "Learns" In-Context: Disentangling Task Recognition and Task Learning , 2023, ACL.

[3]  Song-Chun Zhu,et al.  Chameleon: Plug-and-Play Compositional Reasoning with Large Language Models , 2023, ArXiv.

[4]  Naman Goyal,et al.  LLaMA: Open and Efficient Foundation Language Models , 2023, ArXiv.

[5]  Lingpeng Kong,et al.  Compositional Exemplars for In-context Learning , 2023, ICML.

[6]  Luke Zettlemoyer,et al.  Toolformer: Language Models Can Teach Themselves to Use Tools , 2023, NeurIPS.

[7]  Chris Callison-Burch,et al.  Faithful Chain-of-Thought Reasoning , 2023, ArXiv.

[8]  Lingpeng Kong,et al.  Self-Adaptive In-Context Learning: An Information Compression Perspective for In-Context Example Selection and Ordering , 2022, ACL.

[9]  Luke Zettlemoyer,et al.  Towards Understanding Chain-of-Thought Prompting: An Empirical Study of What Matters , 2022, ACL.

[10]  William W. Cohen,et al.  Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks , 2022, ArXiv.

[11]  Jamie Callan,et al.  PAL: Program-aided Language Models , 2022, ICML.

[12]  Alexander M. Rush,et al.  BLOOM: A 176B-Parameter Open-Access Multilingual Language Model , 2022, ArXiv.

[13]  Dong Yu,et al.  Knowledge-in-Context: Towards Knowledgeable Semi-Parametric Language Models , 2022, ICLR.

[14]  S. Gu,et al.  Large Language Models Can Self-Improve , 2022, EMNLP.

[15]  Jordan L. Boyd-Graber,et al.  Prompting GPT-3 To Be Reliable , 2022, ICLR.

[16]  Andrew M. Dai,et al.  Mind's Eye: Grounded Language Model Reasoning through Simulation , 2022, ICLR.

[17]  Alexander J. Smola,et al.  Automatic Chain of Thought Prompting in Large Language Models , 2022, ICLR.

[18]  D. Schuurmans,et al.  Rationale-Augmented Ensembles in Language Models , 2022, ArXiv.

[19]  Kang Min Yoo,et al.  Ground-Truth Labels Matter: A Deeper Look into Input-Label Demonstrations , 2022, EMNLP.

[20]  S. Gu,et al.  Large Language Models are Zero-Shot Reasoners , 2022, NeurIPS.

[21]  D. Schuurmans,et al.  Least-to-Most Prompting Enables Complex Reasoning in Large Language Models , 2022, ICLR.

[22]  Greg Durrett,et al.  The Unreliability of Explanations in Few-shot Prompting for Textual Reasoning , 2022, NeurIPS.

[23]  Xi Victoria Lin,et al.  OPT: Open Pre-trained Transformer Language Models , 2022, ArXiv.

[24]  Kyunghyun Cho,et al.  On the Effect of Pretraining Corpora on In-context Learning by a Large-scale Language Model , 2022, NAACL.

[25]  D. Schuurmans,et al.  Self-Consistency Improves Chain of Thought Reasoning in Language Models , 2022, ICLR.

[26]  M. Lewis,et al.  Rethinking the Role of Demonstrations: What Makes In-Context Learning Work? , 2022, Conference on Empirical Methods in Natural Language Processing.

[27]  Pascale Fung,et al.  Survey of Hallucination in Natural Language Generation , 2022, ACM Comput. Surv..

[28]  Dale Schuurmans,et al.  Chain of Thought Prompting Elicits Reasoning in Large Language Models , 2022, NeurIPS.

[29]  Renelito Delos Santos,et al.  LaMDA: Language Models for Dialog Applications , 2022, ArXiv.

[30]  Jeff Wu,et al.  WebGPT: Browser-assisted question-answering with human feedback , 2021, ArXiv.

[31]  Po-Sen Huang,et al.  Scaling Language Models: Methods, Analysis & Insights from Training Gopher , 2021, ArXiv.

[32]  M. Lewis,et al.  MetaICL: Learning to Learn In Context , 2021, NAACL.

[33]  Mohammad Bavarian,et al.  Training Verifiers to Solve Math Word Problems , 2021, ArXiv.

[34]  G. Karypis,et al.  Meta-learning via Language Model In-context Tuning , 2021, ACL.

[35]  Hiroaki Hayashi,et al.  Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing , 2021, ACM Comput. Surv..

[36]  Jason Weston,et al.  Internet-Augmented Dialogue Generation , 2021, ACL.

[37]  S. Riedel,et al.  Fantastically Ordered Prompts and Where to Find Them: Overcoming Few-Shot Prompt Order Sensitivity , 2021, ACL.

[38]  Danqi Chen,et al.  SimCSE: Simple Contrastive Learning of Sentence Embeddings , 2021, EMNLP.

[39]  Yangqiu Song,et al.  ASER: Towards Large-scale Commonsense Knowledge Acquisition via Higher-order Selectional Preference over Eventualities , 2021, Artif. Intell..

[40]  Navin Goyal,et al.  Are NLP Models really able to Solve Simple Math Word Problems? , 2021, NAACL.

[41]  D. Klein,et al.  Calibrate Before Use: Improving Few-Shot Performance of Language Models , 2021, ICML.

[42]  Weizhu Chen,et al.  What Makes Good In-Context Examples for GPT-3? , 2021, DEELIO.

[43]  Jonathan Berant,et al.  Did Aristotle Use a Laptop? A Question Answering Benchmark with Implicit Reasoning Strategies , 2021, Transactions of the Association for Computational Linguistics.

[44]  N. Mostafazadeh,et al.  GLUCOSE: GeneraLized and COntextualized Story Explanations , 2020, Conference on Empirical Methods in Natural Language Processing.

[45]  Xiao Ding,et al.  Guided Generation of Cause and Effect , 2020, IJCAI.

[46]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[47]  Yoav Goldberg,et al.  Towards Faithfully Interpretable NLP Systems: How Should We Define and Evaluate Faithfulness? , 2020, ACL.

[48]  Hannaneh Hajishirzi,et al.  Logic-Guided Data Augmentation and Regularization for Consistent Question Answering , 2020, ACL.

[49]  Fabio Petroni,et al.  How Context Affects Language Models' Factual Predictions , 2020, AKBC.

[50]  Zhiyuan Liu,et al.  KEPLER: A Unified Model for Knowledge Embedding and Pre-trained Language Representation , 2019, Transactions of the Association for Computational Linguistics.

[51]  Sanjiv Kumar,et al.  Accelerating Large-Scale Inference with Anisotropic Vector Quantization , 2019, ICML.

[52]  Xin Liu,et al.  ASER: A Large-scale Eventuality Knowledge Graph , 2019, WWW.

[53]  Ming-Wei Chang,et al.  BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions , 2019, NAACL.

[54]  Peter Clark,et al.  Can a Suit of Armor Conduct Electricity? A New Dataset for Open Book Question Answering , 2018, EMNLP.

[55]  Oren Etzioni,et al.  Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge , 2018, ArXiv.

[56]  Wang Ling,et al.  Program Induction by Rationale Generation: Learning to Solve and Explain Algebraic Word Problems , 2017, ACL.

[57]  Catherine Havasi,et al.  ConceptNet 5.5: An Open Multilingual Graph of General Knowledge , 2016, AAAI.

[58]  Dan Roth,et al.  Solving General Arithmetic Word Problems , 2016, EMNLP.

[59]  Zhiyuan Liu,et al.  Learning Entity and Relation Embeddings for Knowledge Graph Completion , 2015, AAAI.

[60]  U. Goswami,et al.  Inductive and Deductive Reasoning , 2007 .

[61]  Weizhu Chen,et al.  Making Language Models Better Reasoners with Step-Aware Verifier , 2023, ACL.

[62]  Weizhu Chen,et al.  On the Advance of Making Language Models Better Reasoners , 2022, ArXiv.

[63]  Jonathan Berant,et al.  CommonsenseQA: A Question Answering Challenge Targeting Commonsense Knowledge , 2019, NAACL.

[64]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[65]  Yejin Choi,et al.  ATOMIC: An Atlas of Machine Commonsense for If-Then Reasoning , 2019, AAAI.

[66]  Aidong Zhang,et al.  A Survey on Context Learning , 2017, IEEE Transactions on Knowledge and Data Engineering.