LambdaNetworks: Modeling Long-Range Interactions Without Attention

[1]  Arman Cohan,et al.  Longformer: The Long-Document Transformer , 2020, ArXiv.

[2]  Nikolaos Pappas,et al.  Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention , 2020, ICML.

[3]  Yi Tay,et al.  Efficient Transformers: A Survey , 2020, ArXiv.

[4]  Yiming Yang,et al.  Transformer-XL: Attentive Language Models beyond a Fixed-Length Context , 2019, ACL.

[5]  Quoc V. Le,et al.  Randaugment: Practical automated data augmentation with a reduced search space , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[6]  Georg Heigold,et al.  Object-Centric Learning with Slot Attention , 2020, NeurIPS.

[7]  Georg Heigold,et al.  An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2021, ICLR.

[8]  Matthieu Cord,et al.  Training data-efficient image transformers & distillation through attention , 2020, ICML.

[9]  In-So Kweon,et al.  CBAM: Convolutional Block Attention Module , 2018, ECCV.

[10]  William J. Dally,et al.  The GPU Computing Era , 2010, IEEE Micro.

[11]  Yichen Wei,et al.  Relation Networks for Object Detection , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[12]  Glenn M. Fung,et al.  Nyströmformer: A Nyström-Based Algorithm for Approximating Self-Attention , 2021, AAAI.

[13]  Noam Shazeer,et al.  Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity , 2021, ArXiv.

[14]  Mark Chen,et al.  Generative Pretraining From Pixels , 2020, ICML.

[15]  Samy Bengio,et al.  Neural Combinatorial Optimization with Reinforcement Learning , 2016, ICLR.

[16]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[17]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[18]  Quoc V. Le,et al.  Self-Training With Noisy Student Improves ImageNet Classification , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Geoffrey E. Hinton,et al.  Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer , 2017, ICLR.

[20]  Mark Sandler,et al.  MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[21]  Stefan Lee,et al.  ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks , 2019, NeurIPS.

[22]  Geoffrey E. Hinton,et al.  Using Fast Weights to Attend to the Recent Past , 2016, NIPS.

[23]  Vladlen Koltun,et al.  Exploring Self-Attention for Image Recognition , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Han Zhang,et al.  Self-Attention Generative Adversarial Networks , 2018, ICML.

[25]  Jeff Donahue,et al.  Large Scale GAN Training for High Fidelity Natural Image Synthesis , 2018, ICLR.

[26]  Lingxiao He,et al.  Video-based Person Re-identification via 3D Convolutional Networks and Non-local Attention , 2018, ACCV.

[27]  Abhinav Gupta,et al.  Non-local Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[28]  Ilya Sutskever,et al.  Learning Transferable Visual Models From Natural Language Supervision , 2021, ICML.

[29]  Gang Sun,et al.  Gather-Excite: Exploiting Feature Context in Convolutional Neural Networks , 2018, NeurIPS.

[30]  Kiho Hong,et al.  Compounding the Performance Improvements of Assembled Techniques in a Convolutional Neural Network , 2020, ArXiv.

[31]  Pieter Abbeel,et al.  Bottleneck Transformers for Visual Recognition , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Chongruo Wu,et al.  ResNeSt: Split-Attention Networks , 2020, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[33]  Stephen Lin,et al.  Local Relation Networks for Image Recognition , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[34]  Han Fang,et al.  Linformer: Self-Attention with Linear Complexity , 2020, ArXiv.

[35]  Noam Shazeer,et al.  Fast Transformer Decoding: One Write-Head is All You Need , 2019, ArXiv.

[36]  Pascal Vincent,et al.  A Cheap Linear Attention Mechanism with Fast Lookups and Fixed-Size Representations , 2016, ArXiv.

[37]  Enhua Wu,et al.  Squeeze-and-Excitation Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  Lucas Beyer,et al.  Big Transfer (BiT): General Visual Representation Learning , 2020, ECCV.

[39]  Frank Hutter,et al.  SGDR: Stochastic Gradient Descent with Warm Restarts , 2016, ICLR.

[40]  Ashish Vaswani,et al.  Stand-Alone Self-Attention in Vision Models , 2019, NeurIPS.

[41]  Zhi Zhang,et al.  Bag of Tricks for Image Classification with Convolutional Neural Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Kurt Keutzer,et al.  Visual Transformers: Token-based Image Representation and Processing for Computer Vision , 2020, ArXiv.

[43]  Yu Cheng,et al.  UNITER: UNiversal Image-TExt Representation Learning , 2019, ECCV.

[44]  Denny Britz,et al.  Efficient Attention using a Fixed-Size Memory Representation , 2017, EMNLP.

[45]  Quoc V. Le,et al.  Neural Architecture Search with Reinforcement Learning , 2016, ICLR.

[46]  Xuhui Jia,et al.  Global Self-Attention Networks for Image Recognition , 2020, ArXiv.

[47]  Ilya Sutskever,et al.  Generating Long Sequences with Sparse Transformers , 2019, ArXiv.

[48]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[49]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[50]  Lukasz Kaiser,et al.  Rethinking Attention with Performers , 2020, ArXiv.

[51]  Aaron C. Courville,et al.  FiLM: Visual Reasoning with a General Conditioning Layer , 2017, AAAI.

[52]  Nicolas Usunier,et al.  End-to-End Object Detection with Transformers , 2020, ECCV.

[53]  Quoc V. Le,et al.  Searching for MobileNetV3 , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[54]  Quoc V. Le,et al.  EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks , 2019, ICML.

[55]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[56]  Kilian Q. Weinberger,et al.  Deep Networks with Stochastic Depth , 2016, ECCV.

[57]  Tim Salimans,et al.  Axial Attention in Multidimensional Transformers , 2019, ArXiv.

[58]  David A. Patterson,et al.  In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[59]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[60]  A Training Details , 2021 .

[61]  A. Yuille,et al.  Axial-DeepLab: Stand-Alone Axial-Attention for Panoptic Segmentation , 2020, ECCV.

[62]  Cordelia Schmid,et al.  VideoBERT: A Joint Model for Video and Language Representation Learning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[63]  Cho-Jui Hsieh,et al.  VisualBERT: A Simple and Performant Baseline for Vision and Language , 2019, ArXiv.

[64]  Lukasz Kaiser,et al.  Reformer: The Efficient Transformer , 2020, ICLR.

[65]  Navdeep Jaitly,et al.  Pointer Networks , 2015, NIPS.

[66]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[67]  In-So Kweon,et al.  BAM: Bottleneck Attention Module , 2018, BMVC.

[68]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[69]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[70]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[71]  Ashish Vaswani,et al.  Self-Attention with Relative Position Representations , 2018, NAACL.