The Annotated Transformer
暂无分享,去创建一个
[1] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[2] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.
[3] Geoffrey E. Hinton,et al. Layer Normalization , 2016, ArXiv.
[4] Yann Dauphin,et al. Convolutional Sequence to Sequence Learning , 2017, ICML.
[5] Quoc V. Le,et al. Massive Exploration of Neural Machine Translation Architectures , 2017, EMNLP.
[6] Lior Wolf,et al. Using the Output Embedding to Improve Language Models , 2016, EACL.
[7] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[8] Alex Graves,et al. Generating Sequences With Recurrent Neural Networks , 2013, ArXiv.
[9] Sergey Ioffe,et al. Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[10] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[11] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..