暂无分享,去创建一个
[1] Omer Levy,et al. Generalization through Memorization: Nearest Neighbor Language Models , 2020, ICLR.
[2] Yi Tay,et al. Efficient Transformers: A Survey , 2020, ArXiv.
[3] Yiming Yang,et al. Transformer-XL: Attentive Language Models beyond a Fixed-Length Context , 2019, ACL.
[4] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[5] Tie-Yan Liu,et al. Rethinking Positional Encoding in Language Pre-training , 2020, ICLR.
[6] Omer Levy,et al. Improving Transformer Models by Reordering their Sublayers , 2020, ACL.
[7] Jason Weston,et al. Curriculum learning , 2009, ICML '09.
[8] Colin Raffel,et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..
[9] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[10] Ilya Sutskever,et al. Generating Long Sequences with Sparse Transformers , 2019, ArXiv.
[11] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[12] Ashish Vaswani,et al. Self-Attention with Relative Position Representations , 2018, NAACL.
[13] Aurko Roy,et al. Efficient Content-Based Sparse Attention with Routing Transformers , 2021, TACL.
[14] Arman Cohan,et al. Longformer: The Long-Document Transformer , 2020, ArXiv.
[15] Lior Wolf,et al. Using the Output Embedding to Improve Language Models , 2016, EACL.
[16] Alexei Baevski,et al. Adaptive Input Representations for Neural Language Modeling , 2018, ICLR.
[17] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .
[18] Richard Socher,et al. Pointer Sentinel Mixture Models , 2016, ICLR.
[19] Timothy P. Lillicrap,et al. Compressive Transformers for Long-Range Sequence Modelling , 2019, ICLR.
[20] Omer Levy,et al. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension , 2019, ACL.
[21] Lukasz Kaiser,et al. Reformer: The Efficient Transformer , 2020, ICLR.
[22] Hakan Inan,et al. Tying Word Vectors and Word Classifiers: A Loss Framework for Language Modeling , 2016, ICLR.
[23] Edouard Grave,et al. Adaptive Attention Span in Transformers , 2019, ACL.