暂无分享,去创建一个
Zhifang Sui | Furu Wei | Yaru Hao | Damai Dai | Li Dong
[1] Yann Dauphin,et al. Pay Less Attention with Lightweight and Dynamic Convolutions , 2019, ICLR.
[2] Avanti Shrikumar,et al. Learning Important Features Through Propagating Activation Differences , 2017, ICML.
[3] Motoaki Kawanabe,et al. How to Explain Individual Classification Decisions , 2009, J. Mach. Learn. Res..
[4] Yejin Choi,et al. COMET: Commonsense Transformers for Automatic Knowledge Graph Construction , 2019, ACL.
[5] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.
[6] Rob Fergus,et al. Visualizing and Understanding Convolutional Networks , 2013, ECCV.
[7] Andreas Loukas,et al. Attention is Not All You Need: Pure Attention Loses Rank Doubly Exponentially with Depth , 2021, ICML.
[8] Quoc V. Le,et al. ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators , 2020, ICLR.
[9] Ming Zhou,et al. InfoXLM: An Information-Theoretic Framework for Cross-Lingual Language Model Pre-Training , 2021, NAACL.
[10] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[11] Yonatan Belinkov,et al. Analyzing the Structure of Attention in a Transformer Language Model , 2019, BlackboxNLP@ACL.
[12] Jianfeng Gao,et al. UniLMv2: Pseudo-Masked Language Models for Unified Language Model Pre-Training , 2020, ICML.
[13] Christophe Gravier,et al. T-REx: A Large Scale Alignment of Natural Language with Knowledge Base Triples , 2018, LREC.
[14] Xiaodong Liu,et al. Unified Language Model Pre-training for Natural Language Understanding and Generation , 2019, NeurIPS.
[15] Graham Neubig,et al. X-FACTR: Multilingual Factual Knowledge Retrieval from Pretrained Language Models , 2020, EMNLP.
[16] Colin Raffel,et al. How Much Knowledge Can You Pack into the Parameters of a Language Model? , 2020, EMNLP.
[17] Guillaume Lample,et al. Cross-lingual Language Model Pretraining , 2019, NeurIPS.
[18] Sebastian Riedel,et al. Language Models as Knowledge Bases? , 2019, EMNLP.
[19] Kevin Gimpel,et al. Gaussian Error Linear Units (GELUs) , 2016 .
[20] Fedor Moiseev,et al. Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned , 2019, ACL.
[21] Daniel Jurafsky,et al. Distant supervision for relation extraction without labeled data , 2009, ACL.
[22] Veselin Stoyanov,et al. Unsupervised Cross-lingual Representation Learning at Scale , 2019, ACL.
[23] Eduard Hovy,et al. Measuring and Improving Consistency in Pretrained Language Models , 2021, Transactions of the Association for Computational Linguistics.
[24] Ke Xu,et al. Self-Attention Attribution: Interpreting Information Interactions Inside Transformer , 2020, AAAI.
[25] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[26] Graham Neubig,et al. How Can We Know What Language Models Know? , 2019, Transactions of the Association for Computational Linguistics.
[27] Alexander Binder,et al. Layer-Wise Relevance Propagation for Neural Networks with Local Renormalization Layers , 2016, ICANN.
[28] Ankur Taly,et al. Axiomatic Attribution for Deep Networks , 2017, ICML.
[29] Omer Levy,et al. Transformer Feed-Forward Layers Are Key-Value Memories , 2020, Conference on Empirical Methods in Natural Language Processing.
[30] Yoshua Bengio,et al. Deep Sparse Rectifier Neural Networks , 2011, AISTATS.
[31] Thomas Brox,et al. Striving for Simplicity: The All Convolutional Net , 2014, ICLR.
[32] Omer Levy,et al. What Does BERT Look at? An Analysis of BERT’s Attention , 2019, BlackboxNLP@ACL.
[33] Andrew Zisserman,et al. Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.