暂无分享,去创建一个
Yoshua Bengio | Giancarlo Kerg | Guillaume Lajoie | Anirudh Goyal | Bhargav Kanuparthi | Kyle Goyette
[1] Sergey Levine,et al. Recurrent Independent Mechanisms , 2019, ICLR.
[2] Jason Weston,et al. End-To-End Memory Networks , 2015, NIPS.
[3] Xuancheng Ren,et al. Explicit Sparse Transformer: Concentrated Attention Through Explicit Selection , 2019, ArXiv.
[4] Yann LeCun,et al. Recurrent Orthogonal Networks and Long-Memory Tasks , 2016, ICML.
[5] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[6] Ilya Sutskever,et al. Generating Long Sequences with Sparse Transformers , 2019, ArXiv.
[7] Yoshua Bengio,et al. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.
[8] Jeffrey M. Zacks,et al. Event boundaries in memory and cognition , 2017, Current Opinion in Behavioral Sciences.
[9] Yoshua Bengio,et al. Gated Orthogonal Recurrent Units: On Learning to Forget , 2017, Neural Computation.
[10] 知秀 柴田. 5分で分かる!? 有名論文ナナメ読み:Jacob Devlin et al. : BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding , 2020 .
[11] Yan Wu,et al. Optimizing agent behavior over long time scales by transporting value , 2018, Nature Communications.
[12] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[13] Wojciech Zaremba,et al. OpenAI Gym , 2016, ArXiv.
[14] Yiming Yang,et al. XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.
[15] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.
[16] Lukasz Kaiser,et al. Reformer: The Efficient Transformer , 2020, ICLR.
[17] Beatrice Santorini,et al. Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.
[18] Jeffrey M. Zacks,et al. Event perception: a mind-brain perspective. , 2007, Psychological bulletin.
[19] Sepp Hochreiter,et al. Untersuchungen zu dynamischen neuronalen Netzen , 1991 .
[20] Yiming Yang,et al. Transformer-XL: Attentive Language Models beyond a Fixed-Length Context , 2019, ACL.
[21] Bowen Zhou,et al. A Structured Self-attentive Sentence Embedding , 2017, ICLR.
[22] Alex Graves,et al. Neural Turing Machines , 2014, ArXiv.
[23] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .
[24] Christopher Joseph Pal,et al. Sparse Attentive Backtracking: Temporal CreditAssignment Through Reminding , 2018, NeurIPS.
[25] Yoshua Bengio,et al. Unitary Evolution Recurrent Neural Networks , 2015, ICML.
[26] Razvan Pascanu,et al. Relational recurrent neural networks , 2018, NeurIPS.
[27] Richard Socher,et al. A Deep Reinforced Model for Abstractive Summarization , 2017, ICLR.
[28] Ioannis Mitliagkas,et al. h-detach: Modifying the LSTM Gradient Towards Better Optimization , 2018, ICLR.
[29] Mario Lezcano Casado,et al. Cheap Orthogonal Constraints in Neural Networks: A Simple Parametrization of the Orthogonal and Unitary Group , 2019, ICML.
[30] Sergio Gomez Colmenarejo,et al. Hybrid computing using a neural network with dynamic external memory , 2016, Nature.
[31] Yoshua Bengio,et al. Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.
[32] Razvan Pascanu,et al. On the difficulty of training recurrent neural networks , 2012, ICML.
[33] Jakob Uszkoreit,et al. A Decomposable Attention Model for Natural Language Inference , 2016, EMNLP.
[34] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[35] Geoffrey E. Hinton,et al. A Simple Way to Initialize Recurrent Networks of Rectified Linear Units , 2015, ArXiv.
[36] Yann LeCun,et al. Orthogonal RNNs and Long-Memory Tasks , 2016, ArXiv.
[37] Tsendsuren Munkhdalai,et al. Metalearned Neural Memory , 2019, NeurIPS.
[38] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.