暂无分享,去创建一个
Jing Gu | Qingyang Wu | Zhou Yu | Zhenzhong Lan | Zhenzhong Lan | Zhou Yu | Jing Gu | Qing-yang Wu | Qingyang Wu
[1] Timothy P. Lillicrap,et al. Compressive Transformers for Long-Range Sequence Modelling , 2019, ICLR.
[2] Li Yang,et al. Big Bird: Transformers for Longer Sequences , 2020, NeurIPS.
[3] Sergio Gomez Colmenarejo,et al. Hybrid computing using a neural network with dynamic external memory , 2016, Nature.
[4] Han Fang,et al. Linformer: Self-Attention with Linear Complexity , 2020, ArXiv.
[5] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.
[6] Richard Socher,et al. Pointer Sentinel Mixture Models , 2016, ICLR.
[7] Arman Cohan,et al. Longformer: The Long-Document Transformer , 2020, ArXiv.
[8] Rico Sennrich,et al. Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.
[9] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[10] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[11] Tianqi Chen,et al. Training Deep Nets with Sublinear Memory Cost , 2016, ArXiv.
[12] Yiming Yang,et al. Transformer-XL: Attentive Language Models beyond a Fixed-Length Context , 2019, ACL.
[13] Omer Levy,et al. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension , 2019, ACL.
[14] Kenneth Heafield,et al. Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016, August 7-12, 2016, Berlin, Germany, Volume 1: Long Papers , 2016, Annual Meeting of the Association for Computational Linguistics.
[15] Ilya Sutskever,et al. Generating Long Sequences with Sparse Transformers , 2019, ArXiv.
[16] Lukasz Kaiser,et al. Reformer: The Efficient Transformer , 2020, ICLR.
[17] Geoffrey E. Hinton,et al. Learning representations by back-propagating errors , 1986, Nature.