Self-Attention with Relative Position Representations
暂无分享,去创建一个
[1] Christopher D. Manning,et al. Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.
[2] Geoffrey E. Hinton,et al. Layer Normalization , 2016, ArXiv.
[3] Yoshua Bengio,et al. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.
[4] Sergey Ioffe,et al. Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[5] Alex Graves,et al. Neural Machine Translation in Linear Time , 2016, ArXiv.
[6] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.
[7] Pietro Liò,et al. Graph Attention Networks , 2017, ICLR.
[8] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[9] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.
[10] Jakob Uszkoreit,et al. A Decomposable Attention Model for Natural Language Inference , 2016, EMNLP.
[11] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[12] Yann Dauphin,et al. Convolutional Sequence to Sequence Learning , 2017, ICML.
[13] Jason Weston,et al. End-To-End Memory Networks , 2015, NIPS.
[14] George Kurian,et al. Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.