暂无分享,去创建一个
Matthijs Douze | Ivan Laptev | Natalia Neverova | Armand Joulin | Gabriel Synnaeve | Piotr Bojanowski | Jakob Verbeek | Alaaeldin El-Nouby | Mathilde Caron | Hugo Touvron | Herv'e Jegou | H. Jégou | I. Laptev | Armand Joulin | Jakob Verbeek | Piotr Bojanowski | Gabriel Synnaeve | N. Neverova | Mathilde Caron | Hugo Touvron | Alaaeldin El-Nouby | Matthijs Douze
[1] Andrew Zisserman,et al. Perceiver: General Perception with Iterative Attention , 2021, ICML.
[2] Li Yang,et al. Big Bird: Transformers for Longer Sequences , 2020, NeurIPS.
[3] Li Yang,et al. ETC: Encoding Long and Structured Inputs in Transformers , 2020, EMNLP.
[4] Han Fang,et al. Linformer: Self-Attention with Linear Complexity , 2020, ArXiv.
[5] Tim Salimans,et al. Axial Attention in Multidimensional Transformers , 2019, ArXiv.
[6] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.
[7] Frank Hutter,et al. Decoupled Weight Decay Regularization , 2017, ICLR.
[8] Jonathan Krause,et al. 3D Object Representations for Fine-Grained Categorization , 2013, 2013 IEEE International Conference on Computer Vision Workshops.
[9] Yang Song,et al. The iNaturalist Species Classification and Detection Dataset , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[10] Michael Isard,et al. Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.
[11] Arman Cohan,et al. Longformer: The Long-Document Transformer , 2020, ArXiv.
[12] Joshua Ainslie,et al. FNet: Mixing Tokens with Fourier Transforms , 2021, NAACL.
[13] Matthijs Douze,et al. LeViT: a Vision Transformer in ConvNet’s Clothing for Faster Inference , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[14] Zhuowen Tu,et al. Aggregated Residual Transformations for Deep Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[15] Cordelia Schmid,et al. Aggregating Local Image Descriptors into Compact Codes , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[16] Lukasz Kaiser,et al. Rethinking Attention with Performers , 2020, ArXiv.
[17] Benjamin Recht,et al. Do ImageNet Classifiers Generalize to ImageNet? , 2019, ICML.
[18] Yuning Jiang,et al. Unified Perceptual Parsing for Scene Understanding , 2018, ECCV.
[19] Stephen Lin,et al. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[20] Nicolas Usunier,et al. End-to-End Object Detection with Transformers , 2020, ECCV.
[21] Gedas Bertasius,et al. Is Space-Time Attention All You Need for Video Understanding? , 2021, ICML.
[22] Cordelia Schmid,et al. Hamming Embedding and Weak Geometric Consistency for Large Scale Image Search , 2008, ECCV.
[23] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .
[24] Matthieu Cord,et al. Training data-efficient image transformers & distillation through attention , 2020, ICML.
[25] Lu Yuan,et al. Multi-Scale Vision Longformer: A New Vision Transformer for High-Resolution Image Encoding , 2021, ArXiv.
[26] Albert Gordo,et al. End-to-End Learning of Deep Visual Representations for Image Retrieval , 2016, International Journal of Computer Vision.
[27] Matthieu Cord,et al. ResMLP: Feedforward Networks for Image Classification With Data-Efficient Training , 2021, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[28] Iasonas Kokkinos,et al. MultiGrain: a unified image embedding for classes and instances , 2019, ArXiv.
[29] Ivan Laptev,et al. Training Vision Transformers for Image Retrieval , 2021, ArXiv.
[30] Giorgos Tolias,et al. Fine-Tuning CNN Image Retrieval with No Human Annotation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[31] Irwan Bello. LambdaNetworks: Modeling Long-Range Interactions Without Attention , 2021, ICLR.
[32] Andrew Zisserman,et al. Automated Flower Classification over a Large Number of Classes , 2008, 2008 Sixth Indian Conference on Computer Vision, Graphics & Image Processing.
[33] Enhua Wu,et al. Squeeze-and-Excitation Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[34] Alexander Kolesnikov,et al. MLP-Mixer: An all-MLP Architecture for Vision , 2021, NeurIPS.
[35] Kaiming He,et al. Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[36] Glenn M. Fung,et al. Nyströmformer: A Nyström-Based Algorithm for Approximating Self-Attention , 2021, AAAI.
[37] Jean Ponce,et al. A Theoretical Analysis of Feature Pooling in Visual Recognition , 2010, ICML.
[38] Julien Mairal,et al. Emerging Properties in Self-Supervised Vision Transformers , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[39] K. Simonyan,et al. High-Performance Large-Scale Image Recognition Without Normalization , 2021, ICML.
[40] Ling Shao,et al. Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions , 2021, ArXiv.
[41] Tao Xiang,et al. Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[42] Nikolaos Pappas,et al. Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention , 2020, ICML.
[43] Dustin Tran,et al. Image Transformer , 2018, ICML.
[44] Quoc V. Le,et al. Randaugment: Practical automated data augmentation with a reduced search space , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).
[45] Fengwei Yu,et al. Incorporating Convolution Designs into Visual Transformers , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[46] Abhinav Gupta,et al. Non-local Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[47] Yutong Lin,et al. Self-Supervised Learning with Swin Transformers , 2021, ArXiv.
[48] Enhua Wu,et al. Transformer in Transformer , 2021, NeurIPS.
[49] Quoc V. Le,et al. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks , 2019, ICML.
[50] Giorgos Tolias,et al. Learning and aggregating deep local descriptors for instance-level recognition , 2020, ECCV.
[51] Kilian Q. Weinberger,et al. Deep Networks with Stochastic Depth , 2016, ECCV.
[52] Matthieu Cord,et al. Going deeper with Image Transformers , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[53] Ronan Sicre,et al. Particular object retrieval with integral max-pooling of CNN activations , 2015, ICLR.
[54] Yannis Avrithis,et al. Revisiting Oxford and Paris: Large-Scale Image Retrieval Benchmarking , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[55] Luc Van Gool,et al. Dynamic Filter Networks , 2016, NIPS.
[56] Matthijs Douze,et al. Fixing the train-test resolution discrepancy , 2019, NeurIPS.
[57] Levent Sagun,et al. ConViT: Improving Vision Transformers with Soft Convolutional Inductive Biases , 2021, ICML.
[58] Kai Chen,et al. MMDetection: Open MMLab Detection Toolbox and Benchmark , 2019, ArXiv.
[59] Shuicheng Yan,et al. Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet , 2021, ArXiv.
[60] Omer Levy,et al. Blockwise Self-Attention for Long Document Understanding , 2020, EMNLP.
[61] Guiguang Ding,et al. RepMLP: Re-parameterizing Convolutions into Fully-connected Layers for Image Recognition , 2021, ArXiv.
[62] Kaiming He,et al. Designing Network Design Spaces , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[63] Edouard Grave,et al. Adaptive Attention Span in Transformers , 2019, ACL.
[64] Cordelia Schmid,et al. ViViT: A Video Vision Transformer , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[65] Ilya Sutskever,et al. Generating Long Sequences with Sparse Transformers , 2019, ArXiv.
[66] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[67] David A. Shamma,et al. YFCC100M , 2015, Commun. ACM.
[68] Georg Heigold,et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2021, ICLR.
[69] Yannis Avrithis,et al. Image Search with Selective Match Kernels: Aggregation Across Single and Multiple Images , 2016, International Journal of Computer Vision.
[70] Kaiming He,et al. Group Normalization , 2018, ECCV.
[71] Vladlen Koltun,et al. Exploring Self-Attention for Image Recognition , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[72] Shuai Yi,et al. Efficient Attention: Attention with Linear Complexities , 2018, 2021 IEEE Winter Conference on Applications of Computer Vision (WACV).
[73] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.
[74] Christoph Feichtenhofer,et al. Multiscale Vision Transformers , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[75] Luke Melas-Kyriazi,et al. Do You Even Need Attention? A Stack of Feed-Forward Layers Does Surprisingly Well on ImageNet , 2021, ArXiv.
[76] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[77] Alex Lamb,et al. Coordination Among Neural Modules Through a Shared Global Workspace , 2021, ArXiv.
[78] Vladlen Koltun,et al. Vision Transformers for Dense Prediction , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[79] Ross B. Girshick,et al. Mask R-CNN , 2017, 1703.06870.
[80] Matthijs Douze,et al. Fixing the train-test resolution discrepancy: FixEfficientNet , 2020, ArXiv.
[81] Michael Isard,et al. Total Recall: Automatic Query Expansion with a Generative Feature Model for Object Retrieval , 2007, 2007 IEEE 11th International Conference on Computer Vision.
[82] Kaiming He,et al. Panoptic Feature Pyramid Networks , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[83] Bolei Zhou,et al. Scene Parsing through ADE20K Dataset , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).