Untangle the KNOT: Interweaving Conflicting Knowledge and Reasoning Skills in Large Language Models

Providing knowledge documents for large language models (LLMs) has emerged as a promising solution to update the static knowledge inherent in their parameters. However, knowledge in the document may conflict with the memory of LLMs due to outdated or incorrect knowledge in the LLMs’ parameters. This leads to the necessity of examining the capability of LLMs to assimilate supplemental external knowledge that conflicts with their memory. While previous studies have explained to what extent LLMs extract conflicting knowledge from the provided text, they neglect the necessity to reason with conflicting knowledge. Furthermore, there lack a detailed analysis on strategies to enable LLMs to resolve conflicting knowledge via prompting, decoding strategy, and supervised fine-tuning. To address these limitations, we construct a new dataset, dubbed KNOT, for knowledge conflict resolution examination in the form of question answering. KNOT facilitates in-depth analysis by dividing reasoning with conflicting knowledge into three levels: (1) Direct Extraction, which directly extracts conflicting knowledge to answer questions. (2) Explicit Reasoning, which reasons with conflicting knowledge when the reasoning path is explicitly provided in the question. (3) Implicit Reasoning, where reasoning with conflicting knowledge requires LLMs to infer the reasoning path independently to answer questions. We also conduct extensive experiments on KNOT to establish empirical guidelines for LLMs to utilize conflicting knowledge in complex circumstances. Dataset and associated codes can be accessed at our GitHub repository .

[1]  Eric Michael Smith,et al.  Llama 2: Open Foundation and Fine-Tuned Chat Models , 2023, ArXiv.

[2]  Xin Lv,et al.  KoRC: Knowledge oriented Reading Comprehension Benchmark for Deep Text Understanding , 2023, ACL.

[3]  Noah A. Smith,et al.  How Far Can Camels Go? Exploring the State of Instruction Tuning on Open Resources , 2023, NeurIPS.

[4]  Luke Zettlemoyer,et al.  Trusting Your Evidence: Hallucinate Less with Context-aware Decoding , 2023, NAACL.

[5]  Minlie Huang,et al.  Enhancing Retrieval-Augmented Large Language Models with Iterative Retrieval-Generation Synergy , 2023, EMNLP.

[6]  Omer Levy,et al.  LIMA: Less Is More for Alignment , 2023, NeurIPS.

[7]  Jing Zhang,et al.  GLM-Dialog: Noise-tolerant Pre-training for Knowledge-grounded Dialogue Generation , 2023, KDD.

[8]  Naman Goyal,et al.  LLaMA: Open and Efficient Foundation Language Models , 2023, ArXiv.

[9]  Ashish Sabharwal,et al.  Interleaving Retrieval with Chain-of-Thought Reasoning for Knowledge-Intensive Multi-Step Questions , 2022, ACL.

[10]  Michael J.Q. Zhang,et al.  Rich Knowledge Sources Bring Complex Knowledge Conflicts: Recalibrating Models to Reflect Conflicting Evidence , 2022, EMNLP.

[11]  Jordan L. Boyd-Graber,et al.  Prompting GPT-3 To Be Reliable , 2022, ICLR.

[12]  Eric Michael Smith,et al.  BlenderBot 3: a deployed conversational agent that continually learns to responsibly engage , 2022, ArXiv.

[13]  Hyung Won Chung,et al.  UL2: Unifying Language Learning Paradigms , 2022, ICLR.

[14]  Stella Rose Biderman,et al.  GPT-NeoX-20B: An Open-Source Autoregressive Language Model , 2022, BIGSCIENCE.

[15]  Ryan J. Lowe,et al.  Training language models to follow instructions with human feedback , 2022, NeurIPS.

[16]  Christopher D. Manning,et al.  Synthetic Disinformation Attacks on Automated Fact Verification Systems , 2022, AAAI.

[17]  Dragomir R. Radev,et al.  UnifiedSKG: Unifying and Multi-Tasking Structured Knowledge Grounding with Text-to-Text Language Models , 2022, EMNLP.

[18]  Peter Clark,et al.  BeliefBank: Adding Memory to a Pre-Trained Language Model for a Systematic Notion of Belief , 2021, EMNLP.

[19]  Nikhil Ramesh,et al.  Entity-Based Knowledge Conflicts in Question Answering , 2021, EMNLP.

[20]  Joshua B. Tenenbaum,et al.  Improving Coherence and Consistency in Neural Sequence Models with Dual-System, Neuro-Symbolic Reasoning , 2021, NeurIPS.

[21]  Zhiyuan Liu,et al.  Is Multi-Hop Reasoning Really Explainable? Towards Benchmarking Reasoning Interpretability , 2021, EMNLP.

[22]  Jonathan Berant,et al.  Did Aristotle Use a Laptop? A Question Answering Benchmark with Implicit Reasoning Strategies , 2021, Transactions of the Association for Computational Linguistics.

[23]  Edouard Grave,et al.  Distilling Knowledge from Reader to Retriever for Question Answering , 2020, ICLR.

[24]  Ming-Wei Chang,et al.  REALM: Retrieval-Augmented Language Model Pre-Training , 2020, ICML.

[25]  Richard Yuanzhe Pang,et al.  Consistency of a Recurrent Language Model with Respect to Incomplete Decoding , 2020, EMNLP.

[26]  Zhiyuan Liu,et al.  KEPLER: A Unified Model for Knowledge Embedding and Pre-trained Language Representation , 2019, Transactions of the Association for Computational Linguistics.

[27]  Sebastian Riedel,et al.  Language Models as Knowledge Bases? , 2019, EMNLP.

[28]  Jason Weston,et al.  Neural Text Generation with Unlikelihood Training , 2019, ICLR.

[29]  Kyunghyun Cho,et al.  Non-Monotonic Sequential Text Generation , 2019, ICML.

[30]  Yoshua Bengio,et al.  HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering , 2018, EMNLP.

[31]  Mohit Bansal,et al.  Adversarial Over-Sensitivity and Over-Stability Strategies for Dialogue Models , 2018, CoNLL.

[32]  Matt Post,et al.  Fast Lexically Constrained Decoding with Dynamic Beam Allocation for Neural Machine Translation , 2018, NAACL.

[33]  Jason Weston,et al.  Translating Embeddings for Modeling Multi-relational Data , 2013, NIPS.

[34]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[35]  Liangming Pan,et al.  ContraQA: Question Answering under Contradicting Contexts , 2021, ArXiv.

[36]  Jonathan Berant,et al.  CommonsenseQA: A Question Answering Challenge Targeting Commonsense Knowledge , 2019, NAACL.

[37]  Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA , 2019, ICML.

[38]  Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. Proceedings of a meeting held December 5-8, 2013, Lake Tahoe, Nevada, United States , 2013, NIPS.