Meta-Learning Fast Weight Language Models
暂无分享,去创建一个
[1] Quoc V. Le,et al. Transformer Quality in Linear Time , 2022, ICML.
[2] Christopher D. Manning,et al. Fast Model Editing at Scale , 2021, ICLR.
[3] Kevin Gimpel,et al. Reconsidering the Past: Optimizing Hidden States in Language Models , 2021, EMNLP.
[4] Kazuki Irie,et al. Going Beyond Linear Transformers with Recurrent Fast Weight Programmers , 2021, NeurIPS.
[5] Kazuki Irie,et al. Linear Transformers Are Secretly Fast Weight Programmers , 2021, ICML.
[6] Aurko Roy,et al. Efficient Content-Based Sparse Attention with Routing Transformers , 2020, TACL.
[7] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[8] Omer Levy,et al. Generalization through Memorization: Nearest Neighbor Language Models , 2019, ICLR.
[9] Oriol Vinyals,et al. Rapid Learning or Feature Reuse? Towards Understanding the Effectiveness of MAML , 2019, ICLR.
[10] Yejin Choi,et al. The Curious Case of Neural Text Degeneration , 2019, ICLR.
[11] Tsendsuren Munkhdalai,et al. Metalearned Neural Memory , 2019, NeurIPS.
[12] Ilya Sutskever,et al. Generating Long Sequences with Sparse Transformers , 2019, ArXiv.
[13] Steve Renals,et al. Dynamic Evaluation of Transformer Language Models , 2019, ArXiv.
[14] Yiming Yang,et al. Transformer-XL: Attentive Language Models beyond a Fixed-Length Context , 2019, ACL.
[15] Steve Renals,et al. Dynamic Evaluation of Neural Sequence Models , 2017, ICML.
[16] Sergey Levine,et al. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.
[17] Hong Yu,et al. Meta Networks , 2017, ICML.
[18] Richard Socher,et al. Pointer Sentinel Mixture Models , 2016, ICLR.
[19] Moustapha Cissé,et al. Efficient softmax approximation for GPUs , 2016, ICML.
[20] Geoffrey E. Hinton,et al. Using Fast Weights to Attend to the Recent Past , 2016, NIPS.
[21] Geoffrey E. Hinton,et al. Layer Normalization , 2016, ArXiv.
[22] Lukás Burget,et al. Recurrent neural network based language model , 2010, INTERSPEECH.
[23] Jürgen Schmidhuber,et al. Learning to Control Fast-Weight Memories: An Alternative to Dynamic Recurrent Networks , 1992, Neural Computation.
[24] Geoffrey E. Hinton. Using fast weights to deblur old memories , 1987 .