暂无分享,去创建一个
Guillaume Lample | Hervé Jégou | Edouard Grave | Armand Joulin | Sainbayar Sukhbaatar | Sainbayar Sukhbaatar | H. Jégou | Edouard Grave | Armand Joulin | Guillaume Lample
[1] Geoffrey E. Hinton,et al. Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer , 2017, ICLR.
[2] Richard Socher,et al. Pointer Sentinel Mixture Models , 2016, ICLR.
[3] Angelika Steger,et al. Fast-Slow Recurrent Neural Networks , 2017, NIPS.
[4] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[5] Blockin Blockin,et al. Quick Training of Probabilistic Neural Nets by Importance Sampling , 2003 .
[6] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.
[7] Joshua Goodman,et al. A bit of progress in language modeling , 2001, Comput. Speech Lang..
[8] Yann Dauphin,et al. Pay Less Attention with Lightweight and Dynamic Convolutions , 2019, ICLR.
[9] Lior Wolf,et al. Using the Output Embedding to Improve Language Models , 2016, EACL.
[10] Yann Dauphin,et al. Language Modeling with Gated Convolutional Networks , 2016, ICML.
[11] Richard Socher,et al. An Analysis of Neural Language Modeling at Multiple Scales , 2018, ArXiv.
[12] Yiming Yang,et al. Transformer-XL: Attentive Language Models beyond a Fixed-Length Context , 2019, ACL.
[13] Lukás Burget,et al. Recurrent neural network based language model , 2010, INTERSPEECH.
[14] Alex Graves,et al. Neural Turing Machines , 2014, ArXiv.
[15] Yoshua Bengio,et al. A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..
[16] Ilya Sutskever,et al. Generating Long Sequences with Sparse Transformers , 2019, ArXiv.
[17] Peter Dayan,et al. Fast Parametric Learning with Activation Memorization , 2018, ICML.
[18] Geoffrey E. Hinton,et al. Layer Normalization , 2016, ArXiv.
[19] Yoshua Bengio,et al. Hierarchical Multiscale Recurrent Neural Networks , 2016, ICLR.
[20] Alexei Baevski,et al. Adaptive Input Representations for Neural Language Modeling , 2018, ICLR.
[21] Jürgen Schmidhuber,et al. Recurrent Highway Networks , 2016, ICML.
[22] Robert A. Jacobs,et al. Hierarchical Mixtures of Experts and the EM Algorithm , 1993, Neural Computation.
[23] Yoshua Bengio,et al. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.
[24] Edouard Grave,et al. Adaptive Attention Span in Transformers , 2019, ACL.
[25] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..
[26] Rico Sennrich,et al. Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.
[27] Yonghui Wu,et al. Exploring the Limits of Language Modeling , 2016, ArXiv.
[28] Nicolas Usunier,et al. Improving Neural Language Models with a Continuous Cache , 2016, ICLR.
[29] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[30] Razvan Pascanu,et al. On the difficulty of training recurrent neural networks , 2012, ICML.
[31] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[32] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[33] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..
[34] Yoshua Bengio,et al. Hierarchical Probabilistic Neural Network Language Model , 2005, AISTATS.
[35] Hakan Inan,et al. Tying Word Vectors and Word Classifiers: A Loss Framework for Language Modeling , 2016, ICLR.
[36] Moustapha Cissé,et al. Efficient softmax approximation for GPUs , 2016, ICML.
[37] Steve Renals,et al. Multiplicative LSTM for sequence modelling , 2016, ICLR.
[38] Jason Weston,et al. End-To-End Memory Networks , 2015, NIPS.
[39] Noah Constant,et al. Character-Level Language Modeling with Deeper Self-Attention , 2018, AAAI.
[40] Jason Weston,et al. Key-Value Memory Networks for Directly Reading Documents , 2016, EMNLP.
[41] Wojciech Zaremba,et al. Recurrent Neural Network Regularization , 2014, ArXiv.
[42] Ashish Vaswani,et al. Self-Attention with Relative Position Representations , 2018, NAACL.