暂无分享,去创建一个
Jiwen Lu | Guan Huang | Jie Zhou | Wenliang Zhao | Yongming Rao | Zheng Zhu | Guangyi Chen | Yansong Tang
[1] Stephen Lin,et al. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[2] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.
[3] Lucas Beyer,et al. Big Transfer (BiT): General Visual Representation Learning , 2020, ECCV.
[4] Chen Sun,et al. Revisiting Unreasonable Effectiveness of Data in Deep Learning Era , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[5] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[6] Kai Chen,et al. MMDetection: Open MMLab Detection Toolbox and Benchmark , 2019, ArXiv.
[7] Georg Heigold,et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2021, ICLR.
[8] Andrew Zisserman,et al. Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[9] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[10] Chen Change Loy,et al. Learning to Prompt for Vision-Language Models , 2021, ArXiv.
[11] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[12] Xiaogang Wang,et al. CAMP: Cross-Modal Adaptive Message Passing for Text-Image Retrieval , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[13] Trevor Darrell,et al. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[14] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[15] Trevor Darrell,et al. Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[16] Kaiming He,et al. Panoptic Feature Pyramid Networks , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[17] Zhe Gan,et al. Less is More: CLIPBERT for Video-and-Language Learning via Sparse Sampling , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[18] Kaiming He,et al. Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[19] Mengmeng Wang,et al. ActionCLIP: A New Paradigm for Video Action Recognition , 2021, ArXiv.
[20] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.
[21] Stephen Lin,et al. Propagate Yourself: Exploring Pixel-Level Consistency for Unsupervised Visual Representation Learning , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[22] Stefan Lee,et al. ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks , 2019, NeurIPS.
[23] Frank Hutter,et al. Decoupled Weight Decay Regularization , 2017, ICLR.
[24] Margaret Mitchell,et al. VQA: Visual Question Answering , 2015, International Journal of Computer Vision.
[25] Xilin Chen,et al. Object-Contextual Representations for Semantic Segmentation , 2019, ECCV.
[26] Lihi Zelnik-Manor,et al. ImageNet-21K Pretraining for the Masses , 2021, NeurIPS Datasets and Benchmarks.
[27] Ilya Sutskever,et al. Learning Transferable Visual Models From Natural Language Supervision , 2021, ICML.
[28] Geoffrey E. Hinton,et al. A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.
[29] Maosong Sun,et al. CPT: Colorful Prompt Tuning for Pre-trained Vision-Language Models , 2021, ArXiv.
[30] Yunchao Wei,et al. CCNet: Criss-Cross Attention for Semantic Segmentation , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[31] Zheng Zhang,et al. Disentangled Non-Local Neural Networks , 2020, ECCV.
[32] Kaiming He,et al. Improved Baselines with Momentum Contrastive Learning , 2020, ArXiv.
[33] Yuning Jiang,et al. Unified Perceptual Parsing for Scene Understanding , 2018, ECCV.
[34] George Papandreou,et al. Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation , 2018, ECCV.
[35] Tao Kong,et al. Dense Contrastive Learning for Self-Supervised Visual Pre-Training , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[36] Julien Mairal,et al. Emerging Properties in Self-Supervised Vision Transformers , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[37] Ross B. Girshick,et al. Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[38] Furu Wei,et al. VL-BERT: Pre-training of Generic Visual-Linguistic Representations , 2019, ICLR.
[39] Peng Gao,et al. Tip-Adapter: Training-free CLIP-Adapter for Better Vision-Language Modeling , 2021, ArXiv.
[40] Yoshua Bengio,et al. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.
[41] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[42] Bolei Zhou,et al. Semantic Understanding of Scenes Through the ADE20K Dataset , 2016, International Journal of Computer Vision.
[43] Ling Shao,et al. Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions , 2021, ArXiv.
[44] Tao Xiang,et al. Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[45] Hiroaki Hayashi,et al. Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing , 2021, ACM Comput. Surv..
[46] Kaiming He,et al. Focal Loss for Dense Object Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[47] Yutong Lin,et al. Self-Supervised Learning with Swin Transformers , 2021, ArXiv.
[48] Matthieu Cord,et al. Training data-efficient image transformers & distillation through attention , 2020, ICML.
[49] Xiaogang Wang,et al. Pyramid Scene Parsing Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[50] Vladlen Koltun,et al. Multi-Scale Context Aggregation by Dilated Convolutions , 2015, ICLR.
[51] Iasonas Kokkinos,et al. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[52] Trevor Darrell,et al. Segmentation from Natural Language Expressions , 2016, ECCV.
[53] Kaiming He,et al. Momentum Contrast for Unsupervised Visual Representation Learning , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[54] Xiaogang Wang,et al. Context Encoding for Semantic Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[55] Peng Gao,et al. CLIP-Adapter: Better Vision-Language Models with Feature Adapters , 2021, Int. J. Comput. Vis..