Transformer Dissection: An Unified Understanding for Transformer’s Attention via the Lens of Kernel
暂无分享,去创建一个
[1] Ruslan Salakhutdinov,et al. Multimodal Transformer for Unaligned Multimodal Language Sequences , 2019, ACL.
[2] Ilya Sutskever,et al. Generating Long Sequences with Sparse Transformers , 2019, ArXiv.
[3] Myle Ott,et al. fairseq: A Fast, Extensible Toolkit for Sequence Modeling , 2019, NAACL.
[4] Ali Farhadi,et al. Video Relationship Reasoning Using Gated Spatio-Temporal Energy Graph , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[5] Pablo Barceló,et al. On the Turing Completeness of Modern Neural Network Architectures , 2019, ICLR.
[6] Yiming Yang,et al. Transformer-XL: Attentive Language Models beyond a Fixed-Length Context , 2019, ACL.
[7] Yee Whye Teh,et al. Set Transformer , 2018, ICML.
[8] Andrew M. Dai,et al. Music Transformer: Generating Music with Long-Term Structure , 2018, ICLR.
[9] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[10] Douglas Eck,et al. Music Transformer , 2018, 1809.04281.
[11] Douglas Eck,et al. An Improved Relative Self-Attention Mechanism for Transformer with Application to Music Generation , 2018, ArXiv.
[12] Ashish Vaswani,et al. Self-Attention with Relative Position Representations , 2018, NAACL.
[13] Vladlen Koltun,et al. An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling , 2018, ArXiv.
[14] Dustin Tran,et al. Image Transformer , 2018, ICML.
[15] Abhinav Gupta,et al. Non-local Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[16] Marc'Aurelio Ranzato,et al. Classical Structured Prediction Losses for Sequence to Sequence Learning , 2017, NAACL.
[17] Adam R. Kosiorek,et al. Set Transformer , 2018, ArXiv.
[18] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[19] Yiming Yang,et al. MMD GAN: Towards Deeper Understanding of Moment Matching Network , 2017, NIPS.
[20] Richard Socher,et al. Pointer Sentinel Mixture Models , 2016, ICLR.
[21] Andrew Gordon Wilson,et al. Deep Kernel Learning , 2015, AISTATS.
[22] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.
[23] Trevor Darrell,et al. What you saw is not what you get: Domain adaptation using asymmetric kernel transforms , 2011, CVPR 2011.
[24] Alper Yilmaz,et al. Object Tracking by Asymmetric Kernel Mean Shift with Automatic Scale and Orientation Selection , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.
[25] Stergios B. Fotopoulos,et al. All of Nonparametric Statistics , 2007, Technometrics.
[26] Jean-Michel Morel,et al. A non-local algorithm for image denoising , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).
[27] Anthony Widjaja,et al. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.
[28] K. Tsuda. Support Vector Classi er with Asymmetric Kernel Functions , 1998 .