MCMAE: Masked Convolution Meets Masked Autoencoders
暂无分享,去创建一个
Hongsheng Li | Jifeng Dai | Teli Ma | Peng Gao | Y. Qiao | Ziyi Lin
[1] Xinggang Wang,et al. Unleashing Vanilla Vision Transformer with Masked Image Modeling for Object Detection , 2022, IEEE International Conference on Computer Vision.
[2] Ross B. Girshick,et al. Exploring Plain Vision Transformer Backbones for Object Detection , 2022, ECCV.
[3] Limin Wang,et al. VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training , 2022, NeurIPS.
[4] Jakob Verbeek,et al. Three things everyone should know about Vision Transformers , 2022, ECCV.
[5] Shijian Lu,et al. Accelerating DETR Convergence via Semantic-Aligned Matching , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[6] Jian Sun,et al. Scaling Up Your Kernels to 31×31: Revisiting Large Kernel Design in CNNs , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[7] Ping Luo,et al. Context Autoencoder for Self-Supervised Representation Learning , 2022, Int. J. Comput. Vis..
[8] Michael Auli,et al. data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language , 2022, ICML.
[9] Hang Su,et al. DAB-DETR: Dynamic Anchor Boxes are Better Queries for DETR , 2022, ICLR.
[10] Yali Wang,et al. UniFormer: Unifying Convolution and Self-Attention for Visual Recognition , 2022, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[11] Trevor Darrell,et al. A ConvNet for the 2020s , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[12] A. Yuille,et al. Masked Feature Prediction for Self-Supervised Visual Pre-Training , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[13] J. Malik,et al. MViTv2: Improved Multiscale Vision Transformers for Classification and Detection , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[14] Fang Wen,et al. PeCo: Perceptual Codebook for BERT Pre-training of Vision Transformers , 2021, AAAI.
[15] Ross B. Girshick,et al. Benchmarking Detection Transfer Learning with Vision Transformers , 2021, ArXiv.
[16] Han Hu,et al. SimMIM: a Simple Framework for Masked Image Modeling , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[17] Tao Kong,et al. iBOT: Image BERT Pre-Training with Online Tokenizer , 2021, ArXiv.
[18] Ross B. Girshick,et al. Masked Autoencoders Are Scalable Vision Learners , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[19] Kai Han,et al. CMT: Convolutional Neural Networks Meet Vision Transformers , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[20] Trevor Darrell,et al. Early Convolutions Help Transformers See Better , 2021, NeurIPS.
[21] Li Dong,et al. BEiT: BERT Pre-Training of Image Transformers , 2021, ICLR.
[22] Quoc V. Le,et al. CoAtNet: Marrying Convolution and Attention for All Data Sizes , 2021, NeurIPS.
[23] Roozbeh Mottaghi,et al. Container: Context Aggregation Network , 2021, NeurIPS.
[24] Julien Mairal,et al. Emerging Properties in Self-Supervised Vision Transformers , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[25] Saining Xie,et al. An Empirical Study of Training Self-Supervised Vision Transformers , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[26] N. Codella,et al. CvT: Introducing Convolutions to Vision Transformers , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[27] Ari S. Morcos,et al. ConViT: improving vision transformers with soft convolutional inductive biases , 2021, ICML.
[28] Enhua Wu,et al. Transformer in Transformer , 2021, NeurIPS.
[29] Alec Radford,et al. Zero-Shot Text-to-Image Generation , 2021, ICML.
[30] Xiang Li,et al. Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[31] Chunhua Shen,et al. Conditional Positional Encodings for Vision Transformers , 2021, ICLR.
[32] Gedas Bertasius,et al. Is Space-Time Attention All You Need for Video Understanding? , 2021, ICML.
[33] Francis E. H. Tay,et al. Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[34] Pieter Abbeel,et al. Bottleneck Transformers for Visual Recognition , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[35] Peng Gao,et al. Fast Convergence of DETR with Spatially Modulated Co-Attention , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[36] Matthieu Cord,et al. Training data-efficient image transformers & distillation through attention , 2020, ICML.
[37] Bin Li,et al. Deformable DETR: Deformable Transformers for End-to-End Object Detection , 2020, ICLR.
[38] Pieter Abbeel,et al. Locally Masked Convolution for Autoregressive Models , 2020, UAI.
[39] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[40] Nicolas Usunier,et al. End-to-End Object Detection with Transformers , 2020, ECCV.
[41] Zheng Zhang,et al. Spatially Adaptive Inference with Stochastic Feature Sampling and Interpolation , 2020, ECCV.
[42] Geoffrey E. Hinton,et al. A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.
[43] Ross B. Girshick,et al. Momentum Contrast for Unsupervised Visual Representation Learning , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[44] Yuning Jiang,et al. Unified Perceptual Parsing for Scene Understanding , 2018, ECCV.
[45] Ting-Chun Wang,et al. Image Inpainting for Irregular Holes Using Partial Convolutions , 2018, ECCV.
[46] Bin Yang,et al. SBNet: Sparse Blocks Network for Fast Inference , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[47] Laurens van der Maaten,et al. 3D Semantic Segmentation with Submanifold Sparse Convolutional Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[48] Bolei Zhou,et al. Temporal Relational Reasoning in Videos , 2017, ECCV.
[49] Abhinav Gupta,et al. Non-local Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[50] Frank Hutter,et al. Decoupled Weight Decay Regularization , 2017, ICLR.
[51] Susanne Westphal,et al. The “Something Something” Video Database for Learning and Evaluating Visual Common Sense , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[52] Laurens van der Maaten,et al. Submanifold Sparse Convolutional Networks , 2017, ArXiv.
[53] Andrew Zisserman,et al. Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[54] Fabio Viola,et al. The Kinetics Human Action Video Dataset , 2017, ArXiv.
[55] Ross B. Girshick,et al. Mask R-CNN , 2017, 1703.06870.
[56] Xiangyu Zhang,et al. Large Kernel Matters — Improve Semantic Segmentation by Global Convolutional Network , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[57] Serge J. Belongie,et al. Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[58] Bolei Zhou,et al. Semantic Understanding of Scenes Through the ADE20K Dataset , 2016, International Journal of Computer Vision.
[59] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.
[60] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[61] Fei-Fei Li,et al. ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.
[62] B. Triggs,et al. Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).
[63] Stephen Lin,et al. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[64] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[65] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .