Transformer visualization via dictionary learning: contextualized embedding as a linear superposition of transformer factors
暂无分享,去创建一个
[1] Yulia Tsvetkov,et al. Sparse Overcomplete Word Vector Representations , 2015, ACL.
[2] Alex Wang,et al. What do you learn from context? Probing for sentence structure in contextualized word representations , 2019, ICLR.
[3] Yonatan Belinkov,et al. Linguistic Knowledge and Transferability of Contextual Representations , 2019, NAACL.
[4] Anna Rumshisky,et al. A Primer in BERTology: What We Know About How BERT Works , 2020, Transactions of the Association for Computational Linguistics.
[5] Geoffrey E. Hinton,et al. How to Represent Part-Whole Hierarchies in a Neural Network , 2021, Neural Computation.
[6] Sanjeev Arora,et al. Linear Algebraic Structure of Word Senses, with Applications to Polysemy , 2016, TACL.
[7] Carlos Guestrin,et al. "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.
[8] Brian Cheung,et al. Word Embedding Visualization Via Dictionary Learning , 2019, ArXiv.
[9] Rob Fergus,et al. Visualizing and Understanding Convolutional Networks , 2013, ECCV.
[10] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[11] Lior Wolf,et al. Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[12] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[13] Kawin Ethayarajh,et al. How Contextual are Contextualized Word Representations? Comparing the Geometry of BERT, ELMo, and GPT-2 Embeddings , 2019, EMNLP.
[14] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[15] Christopher D. Manning,et al. A Structural Probe for Finding Syntax in Word Representations , 2019, NAACL.
[16] Georg Heigold,et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2021, ICLR.
[17] Geoffrey E. Hinton,et al. Dynamic Routing Between Capsules , 2017, NIPS.
[18] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..
[19] Marc Teboulle,et al. A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..
[20] Graham Neubig,et al. How Can We Know What Language Models Know? , 2019, Transactions of the Association for Computational Linguistics.
[21] Lior Wolf,et al. Transformer Interpretability Beyond Attention Visualization , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[22] Martin Wattenberg,et al. Visualizing and Measuring the Geometry of BERT , 2019, NeurIPS.