Editing Large Language Models: Problems, Methods, and Opportunities

Recent advancements in deep learning have precipitated the emergence of large language models (LLMs) which exhibit an impressive aptitude for understanding and producing text akin to human language. Despite the ability to train highly capable LLMs, the methodology for maintaining their relevancy and rectifying errors remains elusive. To that end, the past few years have witnessed a surge in techniques for editing LLMs, the objective of which is to alter the behavior of LLMs within a specific domain without negatively impacting performance across other inputs. This paper embarks on a deep exploration of the problems, methods, and opportunities relating to model editing for LLMs. In particular, we provide an exhaustive overview of the task definition and challenges associated with model editing, along with an in-depth empirical analysis of the most progressive methods currently at our disposal. We also build a new benchmark dataset to facilitate a more robust evaluation and pinpoint enduring issues intrinsic to existing techniques. Our objective is to provide valuable insights into the effectiveness and feasibility of each model editing technique, thereby assisting the research community in making informed decisions when choosing the most appropriate method for a specific task or context. Code and datasets will be available at https://github.com/zjunlp/EasyEdit.

[1]  Andrew M. Dai,et al.  PaLM 2 Technical Report , 2023, ArXiv.

[2]  Danqi Chen,et al.  What In-Context Learning "Learns" In-Context: Disentangling Task Recognition and Task Learning , 2023, ACL.

[3]  Le Sun,et al.  Retentive or Forgetful? Diving into the Knowledge Memorizing Mechanism of Language Models , 2023, LREC.

[4]  Fei Huang,et al.  Knowledge Rumination for Pre-trained Language Models , 2023, ArXiv.

[5]  Michael J.Q. Zhang,et al.  Can LMs Learn New Entities from Descriptions? Challenges in Propagating Injected Knowledge , 2023, ACL.

[6]  Hao Sun,et al.  Safety Assessment of Chinese Large Language Models , 2023, ArXiv.

[7]  M. Backes,et al.  In ChatGPT We Trust? Measuring and Characterizing the Reliability of ChatGPT , 2023, ArXiv.

[8]  Wayne Xin Zhao,et al.  A Survey of Large Language Models , 2023, ArXiv.

[9]  Henrique Pondé de Oliveira Pinto,et al.  GPT-4 Technical Report , 2023, 2303.08774.

[10]  Naman Goyal,et al.  LLaMA: Open and Efficient Foundation Language Models , 2023, ArXiv.

[11]  Jie Zhou,et al.  Transformer-Patcher: One Mistake worth One Neuron , 2023, ICLR.

[12]  Mohit Bansal,et al.  Does Localization Inform Editing? Surprising Differences in Causality-Based Localization vs. Knowledge Editing in Language Models , 2023, ArXiv.

[13]  Yoav Goldberg,et al.  Understanding Transformer Memorization Recall Through Idioms , 2022, EACL.

[14]  Mohan S. Kankanhalli,et al.  Fast Yet Effective Machine Unlearning , 2021, IEEE transactions on neural networks and learning systems.

[15]  Jacob Andreas,et al.  Measuring and Manipulating Knowledge Representations in Language Models , 2023, ArXiv.

[16]  Fei Huang,et al.  Reasoning with Language Model Prompting: A Survey , 2022, ArXiv.

[17]  Zhengyan Zhang,et al.  Finding Skill Neurons in Pre-trained Transformer-based Language Models , 2022, EMNLP.

[18]  Christopher D. Manning,et al.  Fixing Model Bugs with Natural Language Patches , 2022, Conference on Empirical Methods in Natural Language Processing.

[19]  Lei Li,et al.  Calibrating Factual Knowledge in Pretrained Language Models , 2022, EMNLP.

[20]  Christopher D. Manning,et al.  Memory-Based Model Editing at Scale , 2022, ICML.

[21]  Wanxiang Che,et al.  Language Anisotropic Cross-Lingual Model Editing , 2022, ACL.

[22]  Yoav Goldberg,et al.  Transformer Feed-Forward Layers Build Predictions by Promoting Concepts in the Vocabulary Space , 2022, EMNLP.

[23]  C. Srinivasa,et al.  PUMA: Performance Unchanged Model Augmentation for Training Data Removal , 2022, AAAI.

[24]  David Bau,et al.  Locating and Editing Factual Associations in GPT , 2022, NeurIPS.

[25]  Li Dong,et al.  Kformer: Knowledge Injection in Transformer Feed-Forward Layers , 2022, NLPCC.

[26]  Christopher D. Manning,et al.  Fast Model Editing at Scale , 2021, ICLR.

[27]  Li Dong,et al.  Knowledge Neurons in Pretrained Transformers , 2021, ACL.

[28]  Li Dong,et al.  Why Can GPT Learn In-Context? Language Models Secretly Perform Gradient Descent as Meta-Optimizers , 2023, ACL.

[29]  Christopher D. Manning,et al.  GreaseLM: Graph REASoning Enhanced Language Models , 2022, ICLR.

[30]  Nicola De Cao,et al.  Editing Factual Knowledge in Language Models , 2021, EMNLP.

[31]  J. Leskovec,et al.  QA-GNN: Reasoning with Language Models and Knowledge Graphs for Question Answering , 2021, NAACL.

[32]  Omer Levy,et al.  Transformer Feed-Forward Layers Are Key-Value Memories , 2020, EMNLP.

[33]  Li Dong,et al.  Self-Attention Attribution: Interpreting Information Interactions Inside Transformer , 2020, AAAI.

[34]  Xuanjing Huang,et al.  K-Adapter: Infusing Knowledge into Pre-Trained Models with Adapters , 2020, FINDINGS.

[35]  Magdalena Biesialska,et al.  Continual Lifelong Learning in Natural Language Processing: A Survey , 2020, COLING.

[36]  Ankit Singh Rawat,et al.  Modifying Memories in Transformer Models , 2020, ArXiv.

[37]  Dilek Z. Hakkani-Tür,et al.  Incorporating Commonsense Knowledge Graph in Pretrained Models for Social Commonsense Tasks , 2020, DEELIO.

[38]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[39]  Fabio Petroni,et al.  Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks , 2020, NeurIPS.

[40]  Artem Babenko,et al.  Editable Neural Networks , 2020, ICLR.

[41]  Colin Raffel,et al.  Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..

[42]  Thomas Wolf,et al.  DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter , 2019, ArXiv.

[43]  Ming-Wei Chang,et al.  Natural Questions: A Benchmark for Question Answering Research , 2019, TACL.

[44]  Maosong Sun,et al.  ERNIE: Enhanced Language Representation with Informative Entities , 2019, ACL.

[45]  Omer Levy,et al.  Zero-Shot Relation Extraction via Reading Comprehension , 2017, CoNLL.

[46]  Jason Weston,et al.  Reading Wikipedia to Answer Open-Domain Questions , 2017, ACL.

[47]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.