论文信息 - LambdaNetworks: Modeling Long-Range Interactions Without Attention - 字舞流文

LambdaNetworks: Modeling Long-Range Interactions Without Attention

[1] Arman Cohan,et al. Longformer: The Long-Document Transformer , 2020, ArXiv.

[2] Nikolaos Pappas,et al. Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention , 2020, ICML.

[3] Yi Tay,et al. Efficient Transformers: A Survey , 2020, ArXiv.

[4] Yiming Yang,et al. Transformer-XL: Attentive Language Models beyond a Fixed-Length Context , 2019, ACL.

[5] Quoc V. Le,et al. Randaugment: Practical automated data augmentation with a reduced search space , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[6] Georg Heigold,et al. Object-Centric Learning with Slot Attention , 2020, NeurIPS.

[7] Georg Heigold,et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2021, ICLR.

[8] Matthieu Cord,et al. Training data-efficient image transformers & distillation through attention , 2020, ICML.

[9] In-So Kweon,et al. CBAM: Convolutional Block Attention Module , 2018, ECCV.

[10] William J. Dally,et al. The GPU Computing Era , 2010, IEEE Micro.

[11] Yichen Wei,et al. Relation Networks for Object Detection , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[12] Glenn M. Fung,et al. Nyströmformer: A Nyström-Based Algorithm for Approximating Self-Attention , 2021, AAAI.

[13] Noam Shazeer,et al. Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity , 2021, ArXiv.

[14] Mark Chen,et al. Generative Pretraining From Pixels , 2020, ICML.

[15] Samy Bengio,et al. Neural Combinatorial Optimization with Reinforcement Learning , 2016, ICLR.

[16] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.

[17] Ross B. Girshick,et al. Mask R-CNN , 2017, 1703.06870.

[18] Quoc V. Le,et al. Self-Training With Noisy Student Improves ImageNet Classification , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[19] Geoffrey E. Hinton,et al. Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer , 2017, ICLR.

[20] Mark Sandler,et al. MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[21] Stefan Lee,et al. ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks , 2019, NeurIPS.

[22] Geoffrey E. Hinton,et al. Using Fast Weights to Attend to the Recent Past , 2016, NIPS.

[23] Vladlen Koltun,et al. Exploring Self-Attention for Image Recognition , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24] Han Zhang,et al. Self-Attention Generative Adversarial Networks , 2018, ICML.

[25] Jeff Donahue,et al. Large Scale GAN Training for High Fidelity Natural Image Synthesis , 2018, ICLR.

[26] Lingxiao He,et al. Video-based Person Re-identification via 3D Convolutional Networks and Non-local Attention , 2018, ACCV.

[27] Abhinav Gupta,et al. Non-local Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[28] Ilya Sutskever,et al. Learning Transferable Visual Models From Natural Language Supervision , 2021, ICML.

[29] Gang Sun,et al. Gather-Excite: Exploiting Feature Context in Convolutional Neural Networks , 2018, NeurIPS.

[30] Kiho Hong,et al. Compounding the Performance Improvements of Assembled Techniques in a Convolutional Neural Network , 2020, ArXiv.

[31] Pieter Abbeel,et al. Bottleneck Transformers for Visual Recognition , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[32] Chongruo Wu,et al. ResNeSt: Split-Attention Networks , 2020, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[33] Stephen Lin,et al. Local Relation Networks for Image Recognition , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[34] Han Fang,et al. Linformer: Self-Attention with Linear Complexity , 2020, ArXiv.

[35] Noam Shazeer,et al. Fast Transformer Decoding: One Write-Head is All You Need , 2019, ArXiv.

[36] Pascal Vincent,et al. A Cheap Linear Attention Mechanism with Fast Lookups and Fixed-Size Representations , 2016, ArXiv.

[37] Enhua Wu,et al. Squeeze-and-Excitation Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38] Lucas Beyer,et al. Big Transfer (BiT): General Visual Representation Learning , 2020, ECCV.

[39] Frank Hutter,et al. SGDR: Stochastic Gradient Descent with Warm Restarts , 2016, ICLR.

[40] Ashish Vaswani,et al. Stand-Alone Self-Attention in Vision Models , 2019, NeurIPS.

[41] Zhi Zhang,et al. Bag of Tricks for Image Classification with Convolutional Neural Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[42] Kurt Keutzer,et al. Visual Transformers: Token-based Image Representation and Processing for Computer Vision , 2020, ArXiv.

[43] Yu Cheng,et al. UNITER: UNiversal Image-TExt Representation Learning , 2019, ECCV.

[44] Denny Britz,et al. Efficient Attention using a Fixed-Size Memory Representation , 2017, EMNLP.

[45] Quoc V. Le,et al. Neural Architecture Search with Reinforcement Learning , 2016, ICLR.

[46] Xuhui Jia,et al. Global Self-Attention Networks for Image Recognition , 2020, ArXiv.

[47] Ilya Sutskever,et al. Generating Long Sequences with Sparse Transformers , 2019, ArXiv.

[48] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[49] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[50] Lukasz Kaiser,et al. Rethinking Attention with Performers , 2020, ArXiv.

[51] Aaron C. Courville,et al. FiLM: Visual Reasoning with a General Conditioning Layer , 2017, AAAI.

[52] Nicolas Usunier,et al. End-to-End Object Detection with Transformers , 2020, ECCV.

[53] Quoc V. Le,et al. Searching for MobileNetV3 , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[54] Quoc V. Le,et al. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks , 2019, ICML.

[55] Bo Chen,et al. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[56] Kilian Q. Weinberger,et al. Deep Networks with Stochastic Depth , 2016, ECCV.

[57] Tim Salimans,et al. Axial Attention in Multidimensional Transformers , 2019, ArXiv.

[58] David A. Patterson,et al. In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[59] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[60] A Training Details , 2021 .

[61] A. Yuille,et al. Axial-DeepLab: Stand-Alone Axial-Attention for Panoptic Segmentation , 2020, ECCV.

[62] Cordelia Schmid,et al. VideoBERT: A Joint Model for Video and Language Representation Learning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[63] Cho-Jui Hsieh,et al. VisualBERT: A Simple and Performant Baseline for Vision and Language , 2019, ArXiv.

[64] Lukasz Kaiser,et al. Reformer: The Efficient Transformer , 2020, ICLR.

[65] Navdeep Jaitly,et al. Pointer Networks , 2015, NIPS.

[66] Yoshua Bengio,et al. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[67] In-So Kweon,et al. BAM: Bottleneck Attention Module , 2018, BMVC.

[68] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[69] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[70] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[71] Ashish Vaswani,et al. Self-Attention with Relative Position Representations , 2018, NAACL.