Adaptive Attention Span in Transformers
暂无分享,去创建一个
Edouard Grave | Armand Joulin | Piotr Bojanowski | Sainbayar Sukhbaatar | Sainbayar Sukhbaatar | Edouard Grave | Armand Joulin | Piotr Bojanowski
[1] Jason Weston,et al. End-To-End Memory Networks , 2015, NIPS.
[2] Yiming Yang,et al. Transformer-XL: Attentive Language Models beyond a Fixed-Length Context , 2019, ACL.
[3] Noah Constant,et al. Character-Level Language Modeling with Deeper Self-Attention , 2018, AAAI.
[4] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[5] Tomas Mikolov,et al. Variable Computation in Recurrent Neural Networks , 2016, ICLR.
[6] Alex Graves,et al. Adaptive Computation Time for Recurrent Neural Networks , 2016, ArXiv.
[7] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.
[8] Hideki Nakayama,et al. An Empirical Study of Adequate Vision Span for Attention-Based Neural Machine Translation , 2016, NMT@ACL.
[9] Christopher D. Manning,et al. Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.
[10] Ashish Vaswani,et al. Self-Attention with Relative Position Representations , 2018, NAACL.