LambdaNetworks: Modeling Long-Range Interactions Without Attention
暂无分享,去创建一个
[1] Arman Cohan,et al. Longformer: The Long-Document Transformer , 2020, ArXiv.
[2] Nikolaos Pappas,et al. Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention , 2020, ICML.
[3] Yi Tay,et al. Efficient Transformers: A Survey , 2020, ArXiv.
[4] Yiming Yang,et al. Transformer-XL: Attentive Language Models beyond a Fixed-Length Context , 2019, ACL.
[5] Quoc V. Le,et al. Randaugment: Practical automated data augmentation with a reduced search space , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).
[6] Georg Heigold,et al. Object-Centric Learning with Slot Attention , 2020, NeurIPS.
[7] Georg Heigold,et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2021, ICLR.
[8] Matthieu Cord,et al. Training data-efficient image transformers & distillation through attention , 2020, ICML.
[9] In-So Kweon,et al. CBAM: Convolutional Block Attention Module , 2018, ECCV.
[10] William J. Dally,et al. The GPU Computing Era , 2010, IEEE Micro.
[11] Yichen Wei,et al. Relation Networks for Object Detection , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[12] Glenn M. Fung,et al. Nyströmformer: A Nyström-Based Algorithm for Approximating Self-Attention , 2021, AAAI.
[13] Noam Shazeer,et al. Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity , 2021, ArXiv.
[14] Mark Chen,et al. Generative Pretraining From Pixels , 2020, ICML.
[15] Samy Bengio,et al. Neural Combinatorial Optimization with Reinforcement Learning , 2016, ICLR.
[16] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.
[17] Ross B. Girshick,et al. Mask R-CNN , 2017, 1703.06870.
[18] Quoc V. Le,et al. Self-Training With Noisy Student Improves ImageNet Classification , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[19] Geoffrey E. Hinton,et al. Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer , 2017, ICLR.
[20] Mark Sandler,et al. MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[21] Stefan Lee,et al. ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks , 2019, NeurIPS.
[22] Geoffrey E. Hinton,et al. Using Fast Weights to Attend to the Recent Past , 2016, NIPS.
[23] Vladlen Koltun,et al. Exploring Self-Attention for Image Recognition , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[24] Han Zhang,et al. Self-Attention Generative Adversarial Networks , 2018, ICML.
[25] Jeff Donahue,et al. Large Scale GAN Training for High Fidelity Natural Image Synthesis , 2018, ICLR.
[26] Lingxiao He,et al. Video-based Person Re-identification via 3D Convolutional Networks and Non-local Attention , 2018, ACCV.
[27] Abhinav Gupta,et al. Non-local Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[28] Ilya Sutskever,et al. Learning Transferable Visual Models From Natural Language Supervision , 2021, ICML.
[29] Gang Sun,et al. Gather-Excite: Exploiting Feature Context in Convolutional Neural Networks , 2018, NeurIPS.
[30] Kiho Hong,et al. Compounding the Performance Improvements of Assembled Techniques in a Convolutional Neural Network , 2020, ArXiv.
[31] Pieter Abbeel,et al. Bottleneck Transformers for Visual Recognition , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[32] Chongruo Wu,et al. ResNeSt: Split-Attention Networks , 2020, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).
[33] Stephen Lin,et al. Local Relation Networks for Image Recognition , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[34] Han Fang,et al. Linformer: Self-Attention with Linear Complexity , 2020, ArXiv.
[35] Noam Shazeer,et al. Fast Transformer Decoding: One Write-Head is All You Need , 2019, ArXiv.
[36] Pascal Vincent,et al. A Cheap Linear Attention Mechanism with Fast Lookups and Fixed-Size Representations , 2016, ArXiv.
[37] Enhua Wu,et al. Squeeze-and-Excitation Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[38] Lucas Beyer,et al. Big Transfer (BiT): General Visual Representation Learning , 2020, ECCV.
[39] Frank Hutter,et al. SGDR: Stochastic Gradient Descent with Warm Restarts , 2016, ICLR.
[40] Ashish Vaswani,et al. Stand-Alone Self-Attention in Vision Models , 2019, NeurIPS.
[41] Zhi Zhang,et al. Bag of Tricks for Image Classification with Convolutional Neural Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[42] Kurt Keutzer,et al. Visual Transformers: Token-based Image Representation and Processing for Computer Vision , 2020, ArXiv.
[43] Yu Cheng,et al. UNITER: UNiversal Image-TExt Representation Learning , 2019, ECCV.
[44] Denny Britz,et al. Efficient Attention using a Fixed-Size Memory Representation , 2017, EMNLP.
[45] Quoc V. Le,et al. Neural Architecture Search with Reinforcement Learning , 2016, ICLR.
[46] Xuhui Jia,et al. Global Self-Attention Networks for Image Recognition , 2020, ArXiv.
[47] Ilya Sutskever,et al. Generating Long Sequences with Sparse Transformers , 2019, ArXiv.
[48] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[49] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[50] Lukasz Kaiser,et al. Rethinking Attention with Performers , 2020, ArXiv.
[51] Aaron C. Courville,et al. FiLM: Visual Reasoning with a General Conditioning Layer , 2017, AAAI.
[52] Nicolas Usunier,et al. End-to-End Object Detection with Transformers , 2020, ECCV.
[53] Quoc V. Le,et al. Searching for MobileNetV3 , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[54] Quoc V. Le,et al. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks , 2019, ICML.
[55] Bo Chen,et al. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.
[56] Kilian Q. Weinberger,et al. Deep Networks with Stochastic Depth , 2016, ECCV.
[57] Tim Salimans,et al. Axial Attention in Multidimensional Transformers , 2019, ArXiv.
[58] David A. Patterson,et al. In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).
[59] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.
[60] A Training Details , 2021 .
[61] A. Yuille,et al. Axial-DeepLab: Stand-Alone Axial-Attention for Panoptic Segmentation , 2020, ECCV.
[62] Cordelia Schmid,et al. VideoBERT: A Joint Model for Video and Language Representation Learning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[63] Cho-Jui Hsieh,et al. VisualBERT: A Simple and Performant Baseline for Vision and Language , 2019, ArXiv.
[64] Lukasz Kaiser,et al. Reformer: The Efficient Transformer , 2020, ICLR.
[65] Navdeep Jaitly,et al. Pointer Networks , 2015, NIPS.
[66] Yoshua Bengio,et al. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.
[67] In-So Kweon,et al. BAM: Bottleneck Attention Module , 2018, BMVC.
[68] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.
[69] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[70] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..
[71] Ashish Vaswani,et al. Self-Attention with Relative Position Representations , 2018, NAACL.