暂无分享,去创建一个
Lukasz Kaiser | Ashish Vaswani | Jakob Uszkoreit | Noam Shazeer | Aidan N. Gomez | Niki Parmar | Illia Polosukhin | Llion Jones | Lukasz Kaiser | Ashish Vaswani | Noam M. Shazeer | Llion Jones | Niki Parmar | Jakob Uszkoreit | Illia Polosukhin
[1] Mirella Lapata,et al. Long Short-Term Memory-Networks for Machine Reading , 2016, EMNLP.
[2] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[3] Samy Bengio,et al. Can Active Memory Replace Attention? , 2016, NIPS.
[4] Alexander M. Rush,et al. Structured Attention Networks , 2017, ICLR.
[5] François Chollet,et al. Xception: Deep Learning with Depthwise Separable Convolutions , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[6] Wei Xu,et al. Deep Recurrent Models with Fast-Forward Connections for Neural Machine Translation , 2016, TACL.
[7] Beatrice Santorini,et al. Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.
[8] Jason Weston,et al. End-To-End Memory Networks , 2015, NIPS.
[9] Richard Socher,et al. A Deep Reinforced Model for Abstractive Summarization , 2017, ICLR.
[10] Yoshua Bengio,et al. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.
[11] Geoffrey E. Hinton,et al. Grammar as a Foreign Language , 2014, NIPS.
[12] Sergey Ioffe,et al. Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[13] Boris Ginsburg,et al. Factorization tricks for LSTM networks , 2017, ICLR.
[14] George Kurian,et al. Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.
[15] Yann Dauphin,et al. Convolutional Sequence to Sequence Learning , 2017, ICML.
[16] Dan Klein,et al. Learning Accurate, Compact, and Interpretable Tree Annotation , 2006, ACL.
[17] Yoshua Bengio,et al. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.
[18] Eugene Charniak,et al. Effective Self-Training for Parsing , 2006, NAACL.
[19] Bowen Zhou,et al. A Structured Self-attentive Sentence Embedding , 2017, ICLR.
[20] Quoc V. Le,et al. Massive Exploration of Neural Machine Translation Architectures , 2017, EMNLP.
[21] Geoffrey E. Hinton,et al. Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer , 2017, ICLR.
[22] Lukasz Kaiser,et al. Neural GPUs Learn Algorithms , 2015, ICLR.
[23] Alex Graves,et al. Neural Machine Translation in Linear Time , 2016, ArXiv.
[24] Alex Graves,et al. Generating Sequences With Recurrent Neural Networks , 2013, ArXiv.
[25] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[26] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..
[27] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.
[28] Yoshua Bengio,et al. Gradient Flow in Recurrent Nets: the Difficulty of Learning Long-Term Dependencies , 2001 .
[29] Noah A. Smith,et al. Recurrent Neural Network Grammars , 2016, NAACL.
[30] Yonghui Wu,et al. Exploring the Limits of Language Modeling , 2016, ArXiv.
[31] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.
[32] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[33] Mary P. Harper,et al. Self-Training PCFG Grammars with Latent Annotations Across Languages , 2009, EMNLP.
[34] Rico Sennrich,et al. Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.
[35] Lior Wolf,et al. Using the Output Embedding to Improve Language Models , 2016, EACL.
[36] Quoc V. Le,et al. Multi-task Sequence to Sequence Learning , 2015, ICLR.