Knowledge Neurons in Pretrained Transformers
暂无分享,去创建一个
Li Dong | Furu Wei | Damai Dai | Zhifang Sui | Y. Hao
[1] Jean-Baptiste Cordonnier,et al. Attention is Not All You Need: Pure Attention Loses Rank Doubly Exponentially with Depth , 2021, ICML.
[2] E. Hovy,et al. Measuring and Improving Consistency in Pretrained Language Models , 2021, Transactions of the Association for Computational Linguistics.
[3] Omer Levy,et al. Transformer Feed-Forward Layers Are Key-Value Memories , 2020, EMNLP.
[4] Graham Neubig,et al. X-FACTR: Multilingual Factual Knowledge Retrieval from Pretrained Language Models , 2020, EMNLP.
[5] Alan Steinheimer. Generators , 2020, Shaping Light for Video in the Age of LEDs.
[6] Ming Zhou,et al. InfoXLM: An Information-Theoretic Framework for Cross-Lingual Language Model Pre-Training , 2020, NAACL.
[7] Ke Xu,et al. Self-Attention Attribution: Interpreting Information Interactions Inside Transformer , 2020, AAAI.
[8] Jianfeng Gao,et al. UniLMv2: Pseudo-Masked Language Models for Unified Language Model Pre-Training , 2020, ICML.
[9] Colin Raffel,et al. How Much Knowledge Can You Pack into the Parameters of a Language Model? , 2020, EMNLP.
[10] Frank F. Xu,et al. How Can We Know What Language Models Know? , 2019, Transactions of the Association for Computational Linguistics.
[11] Myle Ott,et al. Unsupervised Cross-lingual Representation Learning at Scale , 2019, ACL.
[12] Sebastian Riedel,et al. Language Models as Knowledge Bases? , 2019, EMNLP.
[13] Anna Rumshisky,et al. Revealing the Dark Secrets of BERT , 2019, EMNLP.
[14] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.
[15] Yejin Choi,et al. COMET: Commonsense Transformers for Automatic Knowledge Graph Construction , 2019, ACL.
[16] Omer Levy,et al. What Does BERT Look at? An Analysis of BERT’s Attention , 2019, BlackboxNLP@ACL.
[17] Yonatan Belinkov,et al. Analyzing the Structure of Attention in a Transformer Language Model , 2019, BlackboxNLP@ACL.
[18] Fedor Moiseev,et al. Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned , 2019, ACL.
[19] Dipanjan Das,et al. BERT Rediscovers the Classical NLP Pipeline , 2019, ACL.
[20] Xiaodong Liu,et al. Unified Language Model Pre-training for Natural Language Understanding and Generation , 2019, NeurIPS.
[21] Felix Wu,et al. Pay Less Attention with Lightweight and Dynamic Convolutions , 2019, ICLR.
[22] Guillaume Lample,et al. Cross-lingual Language Model Pretraining , 2019, NeurIPS.
[23] Christophe Gravier,et al. T-REx: A Large Scale Alignment of Natural Language with Knowledge Base Triples , 2018, LREC.
[24] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[25] Avanti Shrikumar,et al. Learning Important Features Through Propagating Activation Differences , 2017, ICML.
[26] Ankur Taly,et al. Axiomatic Attribution for Deep Networks , 2017, ICML.
[27] Kevin Gimpel,et al. Gaussian Error Linear Units (GELUs) , 2016, 1606.08415.
[28] Alexander Binder,et al. Layer-Wise Relevance Propagation for Neural Networks with Local Renormalization Layers , 2016, ICANN.
[29] Martin A. Riedmiller,et al. Striving for Simplicity: The All Convolutional Net , 2014, ICLR.
[30] Andrew Zisserman,et al. Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.
[31] Rob Fergus,et al. Visualizing and Understanding Convolutional Networks , 2013, ECCV.
[32] Yoshua Bengio,et al. Deep Sparse Rectifier Neural Networks , 2011, AISTATS.
[33] Motoaki Kawanabe,et al. How to Explain Individual Classification Decisions , 2009, J. Mach. Learn. Res..
[34] Daniel Jurafsky,et al. Distant supervision for relation extraction without labeled data , 2009, ACL.
[35] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.