An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
暂无分享,去创建一个
S. Gelly | A. Dosovitskiy | Thomas Unterthiner | Jakob Uszkoreit | G. Heigold | Alexander Kolesnikov | Xiaohua Zhai | L. Beyer | N. Houlsby | M. Dehghani | Dirk Weissenborn | Matthias Minderer | Mostafa Dehghani | Lucas Beyer | Alexey Dosovitskiy | Mostafa Dehghani | Georg Heigold | Alexey Dosovitskiy
[1] Alexander D'Amour,et al. On Robustness and Transferability of Convolutional Neural Networks , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[2] Orhan Firat,et al. GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding , 2020, ICLR.
[3] Mark Chen,et al. Generative Pretraining From Pixels , 2020, ICML.
[4] Thomas Kipf,et al. Object-Centric Learning with Slot Attention , 2020, NeurIPS.
[5] Kurt Keutzer,et al. Visual Transformers: Token-based Image Representation and Processing for Computer Vision , 2020, ArXiv.
[6] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[7] Nicolas Usunier,et al. End-to-End Object Detection with Transformers , 2020, ECCV.
[8] Willem Zuidema,et al. Quantifying Attention Flow in Transformers , 2020, ACL.
[9] Vladlen Koltun,et al. Exploring Self-Attention for Image Recognition , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[10] Matthijs Douze,et al. Fixing the train-test resolution discrepancy: FixEfficientNet , 2020, ArXiv.
[11] A. Yuille,et al. Axial-DeepLab: Stand-Alone Axial-Attention for Panoptic Segmentation , 2020, ECCV.
[12] Geoffrey E. Hinton,et al. A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.
[13] S. Gelly,et al. Big Transfer (BiT): General Visual Representation Learning , 2019, ECCV.
[14] S. Gelly,et al. Self-Supervised Learning of Video-Induced Visual Invariances , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[15] Ross B. Girshick,et al. Momentum Contrast for Unsupervised Visual Representation Learning , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[16] Quoc V. Le,et al. Self-Training With Noisy Student Improves ImageNet Classification , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[17] Martin Jaggi,et al. On the Relationship between Self-Attention and Convolutional Layers , 2019, ICLR.
[18] Yu Cheng,et al. UNITER: UNiversal Image-TExt Representation Learning , 2019, ECCV.
[19] Jakob Uszkoreit,et al. Scaling Autoregressive Video Models , 2019, ICLR.
[20] Ali Razavi,et al. Data-Efficient Image Recognition with Contrastive Predictive Coding , 2019, ICML.
[21] André Susano Pinto,et al. A Large-scale Study of Representation Learning with the Visual Task Adaptation Benchmark , 2019, 1910.04867.
[22] Tim Salimans,et al. Axial Attention in Multidimensional Transformers , 2019, ArXiv.
[23] Cho-Jui Hsieh,et al. VisualBERT: A Simple and Performant Baseline for Vision and Language , 2019, ArXiv.
[24] Stefan Lee,et al. ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks , 2019, NeurIPS.
[25] Matthijs Douze,et al. Fixing the train-test resolution discrepancy , 2019, NeurIPS.
[26] Ashish Vaswani,et al. Stand-Alone Self-Attention in Vision Models , 2019, NeurIPS.
[27] Jingbo Zhu,et al. Learning Deep Transformer Models for Machine Translation , 2019, ACL.
[28] R Devon Hjelm,et al. Learning Representations by Maximizing Mutual Information Across Views , 2019, NeurIPS.
[29] Stephen Lin,et al. Local Relation Networks for Image Recognition , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[30] Ilya Sutskever,et al. Generating Long Sequences with Sparse Transformers , 2019, ArXiv.
[31] Quoc V. Le,et al. Attention Augmented Convolutional Networks , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[32] Cordelia Schmid,et al. VideoBERT: A Joint Model for Video and Language Representation Learning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[33] Yunchao Wei,et al. CCNet: Criss-Cross Attention for Semantic Segmentation , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[34] Alexei Baevski,et al. Adaptive Input Representations for Neural Language Modeling , 2018, ICLR.
[35] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[36] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .
[37] Kaiming He,et al. Exploring the Limits of Weakly Supervised Pretraining , 2018, ECCV.
[38] Dustin Tran,et al. Image Transformer , 2018, ICML.
[39] Yichen Wei,et al. Relation Networks for Object Detection , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[40] Abhinav Gupta,et al. Non-local Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[41] Chen Sun,et al. Revisiting Unreasonable Effectiveness of Data in Deep Learning Era , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[42] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[43] Tim Salimans,et al. Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks , 2016, NIPS.
[44] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[45] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[46] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[47] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[48] C. V. Jawahar,et al. Cats and dogs , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.
[49] Fei-Fei Li,et al. ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.
[50] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .
[51] Andrew Zisserman,et al. Automated Flower Classification over a Large Number of Classes , 2008, 2008 Sixth Indian Conference on Computer Vision, Graphics & Image Processing.
[52] Boris Polyak,et al. Acceleration of stochastic approximation by averaging , 1992 .
[53] Lawrence D. Jackel,et al. Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.