暂无分享,去创建一个
[1] Christopher D. Manning,et al. Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.
[2] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[3] Xavier Gastaldi,et al. Shake-Shake regularization , 2017, ArXiv.
[4] Wei Xu,et al. Deep Recurrent Models with Fast-Forward Connections for Neural Machine Translation , 2016, TACL.
[5] Richard Socher,et al. A Deep Reinforced Model for Abstractive Summarization , 2017, ICLR.
[6] Samy Bengio,et al. Can Active Memory Replace Attention? , 2016, NIPS.
[7] Jakob Uszkoreit,et al. A Decomposable Attention Model for Natural Language Inference , 2016, EMNLP.
[8] George Kurian,et al. Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.
[9] Lior Wolf,et al. Using the Output Embedding to Improve Language Models , 2016, EACL.
[10] Chris Dyer,et al. On the State of the Art of Evaluation in Neural Language Models , 2017, ICLR.
[11] Rico Sennrich,et al. Deep architectures for Neural Machine Translation , 2017, WMT.
[12] Geoffrey Zweig,et al. The microsoft 2016 conversational speech recognition system , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[13] Geoffrey E. Hinton,et al. Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer , 2017, ICLR.
[14] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.
[15] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.
[16] Yoshua Bengio,et al. Gradient Flow in Recurrent Nets: the Difficulty of Learning Long-Term Dependencies , 2001 .
[17] Hakan Inan,et al. Tying Word Vectors and Word Classifiers: A Loss Framework for Language Modeling , 2016, ICLR.
[18] Yu Zhang,et al. Training RNNs as Fast as CNNs , 2017, EMNLP 2018.
[19] Geoffrey E. Hinton,et al. Layer Normalization , 2016, ArXiv.
[20] Rico Sennrich,et al. Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.
[21] Sepp Hochreiter,et al. The Vanishing Gradient Problem During Learning Recurrent Neural Nets and Problem Solutions , 1998, Int. J. Uncertain. Fuzziness Knowl. Based Syst..
[22] Yann Dauphin,et al. Convolutional Sequence to Sequence Learning , 2017, ICML.
[23] Yann Dauphin,et al. A Convolutional Encoder Model for Neural Machine Translation , 2016, ACL.
[24] Richard Socher,et al. Regularizing and Optimizing LSTM Language Models , 2017, ICLR.
[25] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[26] Lorenzo Torresani,et al. BranchConnect: Large-Scale Visual Recognition with Learned Branch Connections , 2017, ArXiv.
[27] Richard Socher,et al. Quasi-Recurrent Neural Networks , 2016, ICLR.
[28] Lorenzo Torresani,et al. BranchConnect: Image Categorization with Learned Branch Connections , 2018, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).
[29] Yoshua Bengio,et al. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.
[30] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[31] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..
[32] Geoffrey E. Hinton,et al. Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[33] Bowen Zhou,et al. A Structured Self-attentive Sentence Embedding , 2017, ICLR.
[34] Alex Graves,et al. Neural Machine Translation in Linear Time , 2016, ArXiv.
[35] Zhuowen Tu,et al. Aggregated Residual Transformations for Deep Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[36] Alexander M. Rush,et al. Structured Attention Networks , 2017, ICLR.
[37] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.