暂无分享,去创建一个
Lukasz Kaiser | Christian Szegedy | Henryk Michalewski | Yuhuai Wu | Piotr Nawrot | Szymon Tworkowski | Michal Tyrolski
[1] Aurko Roy,et al. Efficient Content-Based Sparse Attention with Routing Transformers , 2021, TACL.
[2] Arman Cohan,et al. Longformer: The Long-Document Transformer , 2020, ArXiv.
[3] John Wieting,et al. CANINE: Pre-training an Efficient Tokenization-Free Encoder for Language Representation , 2021, ArXiv.
[4] Lukasz Kaiser,et al. Rethinking Attention with Performers , 2020, ArXiv.
[5] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .
[6] Jure Leskovec,et al. Combiner: Full Attention Transformer with Sparse Computation Cost , 2021, NeurIPS.
[7] Kyungwoo Song,et al. Score Matching Model for Unbounded Data Score , 2021, ArXiv.
[8] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[9] Tim Salimans,et al. Axial Attention in Multidimensional Transformers , 2019, ArXiv.
[10] Koray Kavukcuoglu,et al. Pixel Recurrent Neural Networks , 2016, ICML.
[11] Mark Chen,et al. Generative Pretraining From Pixels , 2020, ICML.
[12] Dustin Tran,et al. Image Transformer , 2018, ICML.
[13] Jonathan Ho,et al. Variational Diffusion Models , 2021, ArXiv.
[14] Nikolaos Pappas,et al. Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention , 2020, ICML.
[15] Yiming Yang,et al. Transformer-XL: Attentive Language Models beyond a Fixed-Length Context , 2019, ACL.
[16] Zhen Qin,et al. Charformer: Fast Character Transformers via Gradient-based Subword Tokenization , 2021, ArXiv.
[17] Guokun Lai,et al. Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing , 2020, NeurIPS.
[18] Edouard Grave,et al. Adaptive Attention Span in Transformers , 2019, ACL.
[19] Anima Anandkumar,et al. Long-Short Transformer: Efficient Transformers for Language and Vision , 2021, NeurIPS.
[20] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[21] Jason Weston,et al. Not All Memories are Created Equal: Learning to Forget by Expiring , 2021, ICML.
[22] Ilya Sutskever,et al. Generating Long Sequences with Sparse Transformers , 2019, ArXiv.
[23] Wojciech Zaremba,et al. Evaluating Large Language Models Trained on Code , 2021, ArXiv.
[24] Colin Raffel,et al. ByT5: Towards a token-free future with pre-trained byte-to-byte models , 2021, ArXiv.
[25] Douglas Eck,et al. Music Transformer , 2018, 1809.04281.
[26] Thomas Brox,et al. U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.
[27] Zheng Zhang,et al. BP-Transformer: Modelling Long-Range Context via Binary Partitioning , 2019, ArXiv.
[28] Marc'Aurelio Ranzato,et al. Multi-scale Transformer Language Models , 2020, ArXiv.
[29] Shengfeng Pan,et al. RoFormer: Enhanced Transformer with Rotary Position Embedding , 2021, ArXiv.
[30] Lukasz Kaiser,et al. Reformer: The Efficient Transformer , 2020, ICLR.
[31] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .
[32] Alex Graves,et al. Conditional Image Generation with PixelCNN Decoders , 2016, NIPS.
[33] Ilya Sutskever,et al. Jukebox: A Generative Model for Music , 2020, ArXiv.