Knowledge Neurons in Pretrained Transformers

Large-scale pretrained language models are surprisingly good at recalling factual knowledge presented in the training corpus. In this paper, we present preliminary studies on how factual knowledge is stored in pretrained Transformers by introducing the concept of knowledge neurons. Specifically, we examine the fill-in-the-blank cloze task for BERT. Given a relational fact, we propose a knowledge attribution method to identify the neurons that express the fact. We find that the activation of such knowledge neurons is positively correlated to the expression of their corresponding facts. In our case studies, we attempt to leverage knowledge neurons to edit (such as update, and erase) specific factual knowledge without fine-tuning. Our results shed light on understanding the storage of knowledge within pretrained Transformers.

[1]  Jean-Baptiste Cordonnier,et al.  Attention is Not All You Need: Pure Attention Loses Rank Doubly Exponentially with Depth , 2021, ICML.

[2]  E. Hovy,et al.  Measuring and Improving Consistency in Pretrained Language Models , 2021, Transactions of the Association for Computational Linguistics.

[3]  Omer Levy,et al.  Transformer Feed-Forward Layers Are Key-Value Memories , 2020, EMNLP.

[4]  Graham Neubig,et al.  X-FACTR: Multilingual Factual Knowledge Retrieval from Pretrained Language Models , 2020, EMNLP.

[5]  Alan Steinheimer Generators , 2020, Shaping Light for Video in the Age of LEDs.

[6]  Ming Zhou,et al.  InfoXLM: An Information-Theoretic Framework for Cross-Lingual Language Model Pre-Training , 2020, NAACL.

[7]  Ke Xu,et al.  Self-Attention Attribution: Interpreting Information Interactions Inside Transformer , 2020, AAAI.

[8]  Jianfeng Gao,et al.  UniLMv2: Pseudo-Masked Language Models for Unified Language Model Pre-Training , 2020, ICML.

[9]  Colin Raffel,et al.  How Much Knowledge Can You Pack into the Parameters of a Language Model? , 2020, EMNLP.

[10]  Frank F. Xu,et al.  How Can We Know What Language Models Know? , 2019, Transactions of the Association for Computational Linguistics.

[11]  Myle Ott,et al.  Unsupervised Cross-lingual Representation Learning at Scale , 2019, ACL.

[12]  Sebastian Riedel,et al.  Language Models as Knowledge Bases? , 2019, EMNLP.

[13]  Anna Rumshisky,et al.  Revealing the Dark Secrets of BERT , 2019, EMNLP.

[14]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[15]  Yejin Choi,et al.  COMET: Commonsense Transformers for Automatic Knowledge Graph Construction , 2019, ACL.

[16]  Omer Levy,et al.  What Does BERT Look at? An Analysis of BERT’s Attention , 2019, BlackboxNLP@ACL.

[17]  Yonatan Belinkov,et al.  Analyzing the Structure of Attention in a Transformer Language Model , 2019, BlackboxNLP@ACL.

[18]  Fedor Moiseev,et al.  Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned , 2019, ACL.

[19]  Dipanjan Das,et al.  BERT Rediscovers the Classical NLP Pipeline , 2019, ACL.

[20]  Xiaodong Liu,et al.  Unified Language Model Pre-training for Natural Language Understanding and Generation , 2019, NeurIPS.

[21]  Felix Wu,et al.  Pay Less Attention with Lightweight and Dynamic Convolutions , 2019, ICLR.

[22]  Guillaume Lample,et al.  Cross-lingual Language Model Pretraining , 2019, NeurIPS.

[23]  Christophe Gravier,et al.  T-REx: A Large Scale Alignment of Natural Language with Knowledge Base Triples , 2018, LREC.

[24]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[25]  Avanti Shrikumar,et al.  Learning Important Features Through Propagating Activation Differences , 2017, ICML.

[26]  Ankur Taly,et al.  Axiomatic Attribution for Deep Networks , 2017, ICML.

[27]  Kevin Gimpel,et al.  Gaussian Error Linear Units (GELUs) , 2016, 1606.08415.

[28]  Alexander Binder,et al.  Layer-Wise Relevance Propagation for Neural Networks with Local Renormalization Layers , 2016, ICANN.

[29]  Martin A. Riedmiller,et al.  Striving for Simplicity: The All Convolutional Net , 2014, ICLR.

[30]  Andrew Zisserman,et al.  Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.

[31]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[32]  Yoshua Bengio,et al.  Deep Sparse Rectifier Neural Networks , 2011, AISTATS.

[33]  Motoaki Kawanabe,et al.  How to Explain Individual Classification Decisions , 2009, J. Mach. Learn. Res..

[34]  Daniel Jurafsky,et al.  Distant supervision for relation extraction without labeled data , 2009, ACL.

[35]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.