暂无分享,去创建一个
Marc'Aurelio Ranzato | Y-Lan Boureau | Ronan Collobert | Sandeep Subramanian | Marc'Aurelio Ranzato | Ronan Collobert | Y-Lan Boureau | Sandeep Subramanian | M. Ranzato | R. Collobert
[1] Kevin Gimpel,et al. Gaussian Error Linear Units (GELUs) , 2016 .
[2] Sanja Fidler,et al. Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[3] Lukasz Kaiser,et al. Reformer: The Efficient Transformer , 2020, ICLR.
[4] Geoffrey E. Hinton,et al. Layer Normalization , 2016, ArXiv.
[5] Guillaume Lample,et al. Large Memory Layers with Product Keys , 2019, NeurIPS.
[6] Richard Socher,et al. Pointer Sentinel Mixture Models , 2016, ICLR.
[7] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.
[8] Yoshua Bengio,et al. SampleRNN: An Unconditional End-to-End Neural Audio Generation Model , 2016, ICLR.
[9] Yoshua Bengio,et al. Hierarchical Recurrent Neural Networks for Long-Term Dependencies , 1995, NIPS.
[10] Lukás Burget,et al. Recurrent neural network based language model , 2010, INTERSPEECH.
[11] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[12] Alec Radford,et al. Scaling Laws for Neural Language Models , 2020, ArXiv.
[13] Christopher D. Manning,et al. Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.
[14] Yann Dauphin,et al. Hierarchical Neural Story Generation , 2018, ACL.
[15] D. Pelli,et al. The uncrowded window of object recognition , 2008, Nature Neuroscience.
[16] Jürgen Schmidhuber,et al. Learning Complex, Extended Sequences Using the Principle of History Compression , 1992, Neural Computation.
[17] Inderjit S. Dhillon,et al. Multiresolution Transformer Networks: Recurrence is Not Essential for Modeling Hierarchical Structure , 2019, ArXiv.
[18] Raquel Urtasun,et al. The Reversible Residual Network: Backpropagation Without Storing Activations , 2017, NIPS.
[19] Eero P. Simoncelli,et al. Metamers of the ventral stream , 2011, Nature Neuroscience.
[20] Yiming Yang,et al. Transformer-XL: Attentive Language Models beyond a Fixed-Length Context , 2019, ACL.
[21] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .
[22] Alexei Baevski,et al. Adaptive Input Representations for Neural Language Modeling , 2018, ICLR.
[23] Vladlen Koltun,et al. Deep Equilibrium Models , 2019, NeurIPS.
[24] Daniel Jurafsky,et al. Sharp Nearby, Fuzzy Far Away: How Neural Language Models Use Context , 2018, ACL.
[25] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.
[26] Yoshua Bengio,et al. Hierarchical Multiscale Recurrent Neural Networks , 2016, ICLR.
[27] Jason Weston,et al. Neural Text Generation with Unlikelihood Training , 2019, ICLR.
[28] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[29] Mirella Lapata,et al. Hierarchical Transformers for Multi-Document Summarization , 2019, ACL.
[30] Rob Fergus,et al. Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks , 2015, NIPS.
[31] Lukasz Kaiser,et al. Generating Wikipedia by Summarizing Long Sequences , 2018, ICLR.
[32] Kenneth Heafield,et al. KenLM: Faster and Smaller Language Model Queries , 2011, WMT@EMNLP.
[33] Stanislau Semeniuta,et al. On Accurate Evaluation of GANs for Language Generation , 2018, ArXiv.
[34] Timothy P. Lillicrap,et al. Compressive Transformers for Long-Range Sequence Modelling , 2019, ICLR.
[35] Noah Constant,et al. Character-Level Language Modeling with Deeper Self-Attention , 2018, AAAI.
[36] Lukasz Kaiser,et al. Sample Efficient Text Summarization Using a Single Pre-Trained Transformer , 2019, ArXiv.
[37] Enrique Alfonseca,et al. Eval all, trust a few, do wrong to none: Comparing sentence generation models , 2018, ArXiv.
[38] Dustin Tran,et al. Image Transformer , 2018, ICML.
[39] Rico Sennrich,et al. Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.
[40] Jürgen Schmidhuber,et al. A Clockwork RNN , 2014, ICML.
[41] Ilya Sutskever,et al. Generating Long Sequences with Sparse Transformers , 2019, ArXiv.
[42] Christopher Joseph Pal,et al. Do Neural Dialog Systems Use the Conversation History Effectively? An Empirical Study , 2019, ACL.
[43] Edward H. Adelson,et al. The Laplacian Pyramid as a Compact Image Code , 1983, IEEE Trans. Commun..
[44] Yann Dauphin,et al. Language Modeling with Gated Convolutional Networks , 2016, ICML.
[45] Leon A. Gatys,et al. Image content is more important than Bouma’s Law for scene metamers , 2018 .
[46] Edouard Grave,et al. Adaptive Attention Span in Transformers , 2019, ACL.
[47] Alex Graves,et al. Generating Sequences With Recurrent Neural Networks , 2013, ArXiv.