暂无分享,去创建一个
Jure Leskovec | Dale Schuurmans | Hongyu Ren | Hanjun Dai | Zihang Dai | Bo Dai | Mengjiao Yang | D. Schuurmans | J. Leskovec | Zihang Dai | Bo Dai | Hongyu Ren | H. Dai | Mengjiao Yang
[1] Sashank J. Reddi,et al. $O(n)$ Connections are Expressive Enough: Universal Approximability of Sparse Transformers , 2020, NeurIPS.
[2] Li Yang,et al. Big Bird: Transformers for Longer Sequences , 2020, NeurIPS.
[3] Shuang Xu,et al. Speech-Transformer: A No-Recurrence Sequence-to-Sequence Model for Speech Recognition , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[4] Andrew M. Dai,et al. Music Transformer: Generating Music with Long-Term Structure , 2018, ICLR.
[5] Lukasz Kaiser,et al. Reformer: The Efficient Transformer , 2020, ICLR.
[6] Yiming Yang,et al. XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.
[7] Mark Chen,et al. Distribution Augmentation for Generative Modeling , 2020, ICML.
[8] Dustin Tran,et al. Image Transformer , 2018, ICML.
[9] Lukasz Kaiser,et al. Rethinking Attention with Performers , 2020, ArXiv.
[10] Junjie Yan,et al. Factorized Attention: Self-Attention with Linear Complexities , 2018, ArXiv.
[11] Ruslan Salakhutdinov,et al. Breaking the Softmax Bottleneck: A High-Rank RNN Language Model , 2017, ICLR.
[12] Nikolaos Pappas,et al. Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention , 2020, ICML.
[13] Georg Heigold,et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2021, ICLR.
[14] Yiming Yang,et al. Transformer-XL: Attentive Language Models beyond a Fixed-Length Context , 2019, ACL.
[15] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[16] Nikhil Naik,et al. ProGen: Language Modeling for Protein Generation , 2020, bioRxiv.
[17] Ilya Sutskever,et al. Generating Long Sequences with Sparse Transformers , 2019, ArXiv.
[18] Nal Kalchbrenner,et al. Generating High Fidelity Images with Subscale Pixel Networks and Multidimensional Upscaling , 2018, ICLR.
[19] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[20] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.
[21] Aurko Roy,et al. Efficient Content-Based Sparse Attention with Routing Transformers , 2021, TACL.
[22] Timothy P. Lillicrap,et al. Compressive Transformers for Long-Range Sequence Modelling , 2019, ICLR.
[23] Tim Salimans,et al. Axial Attention in Multidimensional Transformers , 2019, ArXiv.
[24] Prafulla Dhariwal,et al. Glow: Generative Flow with Invertible 1x1 Convolutions , 2018, NeurIPS.
[25] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[26] Liu Yang,et al. Long Range Arena: A Benchmark for Efficient Transformers , 2020, ICLR.
[27] Kevin Gimpel,et al. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations , 2019, ICLR.
[28] Wenhu Chen,et al. Enhancing the Locality and Breaking the Memory Bottleneck of Transformer on Time Series Forecasting , 2019, NeurIPS.
[29] Alexander J. Smola,et al. Deep Sets , 2017, 1703.06114.
[30] Koray Kavukcuoglu,et al. Pixel Recurrent Neural Networks , 2016, ICML.
[31] Aditya Kanade,et al. Learning and Evaluating Contextual Embedding of Source Code , 2019, ICML.
[32] Roy Schwartz,et al. Random Feature Attention , 2021, ICLR.
[33] Yunchao Wei,et al. CCNet: Criss-Cross Attention for Semantic Segmentation , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[34] Sergio Gomez Colmenarejo,et al. Parallel Multiscale Autoregressive Density Estimation , 2017, ICML.
[35] Pieter Abbeel,et al. PixelSNAIL: An Improved Autoregressive Generative Model , 2017, ICML.
[36] Zihang Dai,et al. Wiki-40B: Multilingual Language Model Dataset , 2020, LREC.
[37] Ankur Bapna,et al. The Best of Both Worlds: Combining Recent Advances in Neural Machine Translation , 2018, ACL.
[38] Inderjit S. Dhillon,et al. Memory Efficient Kernel Approximation , 2014, ICML.
[39] Colin Raffel,et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..
[40] Benjamin Recht,et al. Random Features for Large-Scale Kernel Machines , 2007, NIPS.
[41] Xi Chen,et al. PixelCNN++: Improving the PixelCNN with Discretized Logistic Mixture Likelihood and Other Modifications , 2017, ICLR.
[42] Han Fang,et al. Linformer: Self-Attention with Linear Complexity , 2020, ArXiv.