暂无分享,去创建一个
Jason Weston | Da Ju | Sainbayar Sukhbaatar | J. Weston | Stephen Roller | Sainbayar Sukhbaatar | Stephen Roller | Da Ju
[1] Lav R. Varshney,et al. CTRL: A Conditional Transformer Language Model for Controllable Generation , 2019, ArXiv.
[2] Omer Levy,et al. Improving Transformer Models by Reordering their Sublayers , 2020, ACL.
[3] Arman Cohan,et al. Longformer: The Long-Document Transformer , 2020, ArXiv.
[4] Yoshua Bengio,et al. A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..
[5] Jeremy Blackburn,et al. The Pushshift Reddit Dataset , 2020, ICWSM.
[6] Xing Wang,et al. Modeling Recurrence for Transformer , 2019, NAACL.
[7] Lukás Burget,et al. Recurrent neural network based language model , 2010, INTERSPEECH.
[8] Jeffrey L. Elman,et al. Finding Structure in Time , 1990, Cogn. Sci..
[9] Yiming Yang,et al. Transformer-XL: Attentive Language Models beyond a Fixed-Length Context , 2019, ACL.
[10] Mary Williamson,et al. Recipes for Building an Open-Domain Chatbot , 2020, EACL.
[11] Edouard Grave,et al. Adaptive Attention Span in Transformers , 2019, ACL.
[12] Naman Goyal,et al. BASE Layers: Simplifying Training of Large, Sparse Models , 2021, ICML.
[13] Noah Constant,et al. Character-Level Language Modeling with Deeper Self-Attention , 2018, AAAI.
[14] 知秀 柴田. 5分で分かる!? 有名論文ナナメ読み:Jacob Devlin et al. : BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding , 2020 .
[15] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.
[16] Jason Weston,et al. The Dialogue Dodecathlon: Open-Domain Knowledge and Image Grounded Conversational Agents , 2020, ACL.
[17] Ankur Bapna,et al. The Best of Both Worlds: Combining Recent Advances in Neural Machine Translation , 2018, ACL.
[18] Kevin Gimpel,et al. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations , 2019, ICLR.
[19] Jason Weston,et al. End-To-End Memory Networks , 2015, NIPS.
[20] Noam Shazeer,et al. Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity , 2021, ArXiv.
[21] Ray Kurzweil,et al. Learning Semantic Textual Similarity from Conversations , 2018, Rep4NLP@ACL.
[22] Ashish Vaswani,et al. Self-Attention with Relative Position Representations , 2018, NAACL.
[23] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.
[24] Antoine Bordes,et al. Training Millions of Personalized Dialogue Agents , 2018, EMNLP.
[25] Di He,et al. Understanding and Improving Transformer From a Multi-Particle Dynamic System Point of View , 2019, ArXiv.
[26] Ilya Sutskever,et al. Generating Long Sequences with Sparse Transformers , 2019, ArXiv.
[27] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[28] Lukasz Kaiser,et al. Reformer: The Efficient Transformer , 2020, ICLR.
[29] Jason Weston,et al. Poly-encoders: Architectures and Pre-training Strategies for Fast and Accurate Multi-sentence Scoring , 2020, ICLR.
[30] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[31] Veselin Stoyanov,et al. Unsupervised Cross-lingual Representation Learning at Scale , 2019, ACL.