论文信息 - Locating and Editing Factual Knowledge in GPT - 字舞流文

Locating and Editing Factual Knowledge in GPT

We investigate the mechanisms underlying factual knowledge recall in autoregressive transformer language models. First, we develop a causal intervention for identifying neuron activations capable of altering a model’s factual predictions. Within large GPT-style models, this reveals two distinct sets of neurons that we hypothesize correspond to knowing an abstract fact and saying a concrete word, respectively. This insight inspires the development of ROME, a novel method for editing facts stored in model weights. For evaluation, we assemble COUNTERFACT, a dataset of over twenty thousand counterfactuals and tools to facilitate sensitive measurements of knowledge editing. Using COUNTERFACT, we confirm the distinction between saying and knowing neurons, and we find that ROME achieves state-of-the-art performance in knowledge editing compared to other methods. An interactive demo notebook, full code implementation, and the dataset are available at https://rome.baulab.info/.

David Bau | Yonatan Belinkov | A. Andonian | Kevin Meng

[1] Li Dong,et al. Knowledge Neurons in Pretrained Transformers , 2021, ACL.

[2] Nicola De Cao,et al. Editing Factual Knowledge in Language Models , 2021, EMNLP.

[3] Danqi Chen,et al. Factual Probing Is [MASK]: Learning vs. Learning to Recall , 2021, NAACL.

[4] Yonatan Belinkov,et al. Probing Classifiers: Promises, Shortcomings, and Advances , 2021, CL.

[5] E. Hovy,et al. Measuring and Improving Consistency in Pretrained Language Models , 2021, Transactions of the Association for Computational Linguistics.

[6] Roger Wattenhofer,et al. Of Non-Linearity and Commutativity in BERT , 2021, 2021 International Joint Conference on Neural Networks (IJCNN).

[7] Omer Levy,et al. Transformer Feed-Forward Layers Are Key-Value Memories , 2020, EMNLP.

[8] Yoav Goldberg,et al. Amnesic Probing: Behavioral Explanation with Amnesic Counterfactuals , 2020, Transactions of the Association for Computational Linguistics.

[9] David Bau,et al. Rewriting a Deep Generative Model , 2020, ECCV.

[10] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.

[11] Uri Shalit,et al. CausaLM: Causal Model Explanation Through Counterfactual Language Models , 2020, CL.

[12] 知秀柴田. 5分で分かる!? 有名論文ナナメ読み：Jacob Devlin et al. : BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding , 2020 .

[13] Fabio Petroni,et al. How Context Affects Language Models' Factual Predictions , 2020, AKBC.

[14] Colin Raffel,et al. How Much Knowledge Can You Pack into the Parameters of a Language Model? , 2020, EMNLP.

[15] Omer Levy,et al. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension , 2019, ACL.

[16] Peter J. Liu,et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..

[17] Sebastian Riedel,et al. Language Models as Knowledge Bases? , 2019, EMNLP.

[18] Zhe Gan,et al. Generating Informative and Diverse Conversational Responses via Adversarial Information Maximization , 2018, NeurIPS.

[19] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[20] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[21] Judea Pearl,et al. Direct and Indirect Effects , 2001, UAI.

[22] James A. Anderson,et al. A simple neural network generating an interactive memory , 1972 .

[23] Teuvo Kohonen,et al. Correlation Matrix Memories , 1972, IEEE Transactions on Computers.

[24] Huteng Dai,et al. Learning nonlocal phonotactics in Strictly Piecewise phonotactic model , 2021, SCIL.

[25] Yonatan Belinkov,et al. Investigating Gender Bias in Language Models Using Causal Mediation Analysis , 2020, NeurIPS.

[26] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[27] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .

[28] Miguel Ángel García Cumbreras,et al. Association for Computational Linguistics , 2001 .