暂无分享,去创建一个
[1] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[2] Nicolas Usunier,et al. Improving Neural Language Models with a Continuous Cache , 2016, ICLR.
[3] Ilya Sutskever,et al. SUBWORD LANGUAGE MODELING WITH NEURAL NETWORKS , 2011 .
[4] James Demmel,et al. Reducing BERT Pre-Training Time from 3 Days to 76 Minutes , 2019, ArXiv.
[5] Richard Socher,et al. Regularizing and Optimizing LSTM Language Models , 2017, ICLR.
[6] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[7] Richard Socher,et al. An Analysis of Neural Language Modeling at Multiple Scales , 2018, ArXiv.
[8] Yiming Yang,et al. Transformer-XL: Attentive Language Models beyond a Fixed-Length Context , 2019, ACL.
[9] Noah Constant,et al. Character-Level Language Modeling with Deeper Self-Attention , 2018, AAAI.
[10] Lior Wolf,et al. Using the Output Embedding to Improve Language Models , 2016, EACL.
[11] Edouard Grave,et al. Adaptive Attention Span in Transformers , 2019, ACL.
[12] Fabrice Bellard. Lossless Data Compression with Neural Networks , 2019 .
[13] Ilya Sutskever,et al. Generating Long Sequences with Sparse Transformers , 2019, ArXiv.
[14] Jürgen Schmidhuber,et al. Recurrent Highway Networks , 2016, ICML.
[15] Noam Shazeer,et al. Fast Transformer Decoding: One Write-Head is All You Need , 2019, ArXiv.
[16] Wonyong Sung,et al. Character-level language modeling with hierarchical recurrent neural networks , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[17] Hakan Inan,et al. Tying Word Vectors and Word Classifiers: A Loss Framework for Language Modeling , 2016, ICLR.
[18] Jeffrey L. Elman,et al. Finding Structure in Time , 1990, Cogn. Sci..
[19] Phil Blunsom,et al. Mogrifier LSTM , 2020, ICLR.