暂无分享,去创建一个
Lu Yuan | Pengchuan Zhang | Xiyang Dai | Jianwei Yang | Jianfeng Gao | Chunyuan Li | Bin Xiao | Mei Gao
[1] Jianfeng Gao,et al. Unified Vision-Language Pre-Training for Image Captioning and VQA , 2020, AAAI.
[2] Paolo Favaro,et al. Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles , 2016, ECCV.
[3] Jonathon Shlens,et al. Scaling Local Self-Attention for Parameter Efficient Visual Backbones , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[4] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[5] Geoffrey E. Hinton,et al. Big Self-Supervised Models are Strong Semi-Supervised Learners , 2020, NeurIPS.
[6] Junying Chen,et al. UP-DETR: Unsupervised Pre-training for Object Detection with Transformers , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[7] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.
[8] Quoc V. Le,et al. Selfie: Self-supervised Pretraining for Image Embedding , 2019, ArXiv.
[9] David J. Schwab,et al. Training BatchNorm and Only BatchNorm: On the Expressive Power of Random Features in CNNs , 2020, ICLR.
[10] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[11] Chunhua Shen,et al. End-to-End Video Instance Segmentation with Transformers , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[12] Jian Yang,et al. Selective Kernel Networks , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[13] Zhe Gan,et al. Variational Autoencoder for Deep Learning of Images, Labels and Captions , 2016, NIPS.
[14] Jeff Donahue,et al. Large Scale Adversarial Representation Learning , 2019, NeurIPS.
[15] Xiaogang Wang,et al. End-to-End Object Detection with Adaptive Clustering Transformer , 2020, BMVC.
[16] Tao Kong,et al. Dense Contrastive Learning for Self-Supervised Visual Pre-Training , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[17] Frank Hutter,et al. Fixing Weight Decay Regularization in Adam , 2017, ArXiv.
[18] Lu Yuan,et al. Multi-Scale Vision Longformer: A New Vision Transformer for High-Resolution Image Encoding , 2021, ArXiv.
[19] Jordi Pont-Tuset,et al. The Open Images Dataset V4 , 2018, International Journal of Computer Vision.
[20] Yoshua Bengio,et al. Learning deep representations by mutual information estimation and maximization , 2018, ICLR.
[21] R Devon Hjelm,et al. Learning Representations by Maximizing Mutual Information Across Views , 2019, NeurIPS.
[22] Julien Mairal,et al. Emerging Properties in Self-Supervised Vision Transformers , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[23] Alexei A. Efros,et al. Unsupervised Visual Representation Learning by Context Prediction , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[24] Jianfeng Gao,et al. Self-supervised Pre-training with Hard Examples Improves Visual Representations , 2020, ArXiv.
[25] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[26] Baining Guo,et al. Learning Texture Transformer Network for Image Super-Resolution , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[27] MaX-DeepLab: End-to-End Panoptic Segmentation with Mask Transformers , 2020, ArXiv.
[28] Furu Wei,et al. VL-BERT: Pre-training of Generic Visual-Linguistic Representations , 2019, ICLR.
[29] Dhruv Batra,et al. Joint Unsupervised Learning of Deep Representations and Image Clusters , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[30] Junnan Li,et al. Prototypical Contrastive Learning of Unsupervised Representations , 2020, ICLR.
[31] Alexei A. Efros,et al. Split-Brain Autoencoders: Unsupervised Learning by Cross-Channel Prediction , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[32] Mohit Bansal,et al. LXMERT: Learning Cross-Modality Encoder Representations from Transformers , 2019, EMNLP.
[33] Xu Ji,et al. Invariant Information Clustering for Unsupervised Image Classification and Segmentation , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[34] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[35] Kaiming He,et al. Improved Baselines with Momentum Contrastive Learning , 2020, ArXiv.
[36] Thomas Brox,et al. Discriminative Unsupervised Feature Learning with Exemplar Convolutional Neural Networks , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[37] Georg Heigold,et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2021, ICLR.
[38] Nan Duan,et al. Unicoder-VL: A Universal Encoder for Vision and Language by Cross-modal Pre-training , 2019, AAAI.
[39] Stephen Lin,et al. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[40] Nicolas Usunier,et al. End-to-End Object Detection with Transformers , 2020, ECCV.
[41] Aäron van den Oord,et al. Divide and Contrast: Self-supervised Learning from Uncurated Data , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[42] Matthijs Douze,et al. Deep Clustering for Unsupervised Learning of Visual Features , 2018, ECCV.
[43] Jianfeng Gao,et al. Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks , 2020, ECCV.
[44] Matthieu Cord,et al. Training data-efficient image transformers & distillation through attention , 2020, ICML.
[45] Bin Li,et al. Deformable DETR: Deformable Transformers for End-to-End Object Detection , 2020, ICLR.
[46] Ahmed El Kholy,et al. UNITER: Learning UNiversal Image-TExt Representations , 2019, ECCV 2020.
[47] Julien Mairal,et al. Unsupervised Learning of Visual Features by Contrasting Cluster Assignments , 2020, NeurIPS.
[48] Saining Xie,et al. An Empirical Study of Training Self-Supervised Vision Transformers , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[49] Alexander Kolesnikov,et al. MLP-Mixer: An all-MLP Architecture for Vision , 2021, NeurIPS.
[50] Ling Shao,et al. Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions , 2021, ArXiv.
[51] Alexei A. Efros,et al. Context Encoders: Feature Learning by Inpainting , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[52] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[53] Stefan Lee,et al. ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks , 2019, NeurIPS.
[54] Cordelia Schmid,et al. What makes for good views for contrastive learning , 2020, NeurIPS.
[55] Wei Li,et al. WebVision Database: Visual Learning and Understanding from Web Data , 2017, ArXiv.
[56] Alexei A. Efros,et al. Colorful Image Colorization , 2016, ECCV.
[57] Michal Valko,et al. Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning , 2020, NeurIPS.
[58] Ali Farhadi,et al. Unsupervised Deep Embedding for Clustering Analysis , 2015, ICML.
[59] Alec Radford,et al. Improving Language Understanding by Generative Pre-Training , 2018 .
[60] Cho-Jui Hsieh,et al. VisualBERT: A Simple and Performant Baseline for Vision and Language , 2019, ArXiv.
[61] Yutong Lin,et al. Self-Supervised Learning with Swin Transformers , 2021, ArXiv.
[62] Chen Change Loy,et al. Online Deep Clustering for Unsupervised Representation Learning , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[63] Mark Chen,et al. Generative Pretraining From Pixels , 2020, ICML.
[64] Kaiming He,et al. Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour , 2017, ArXiv.
[65] Armand Joulin,et al. Self-supervised Pretraining of Visual Features in the Wild , 2021, ArXiv.
[66] Nikos Komodakis,et al. Unsupervised Representation Learning by Predicting Image Rotations , 2018, ICLR.
[67] Oriol Vinyals,et al. Representation Learning with Contrastive Predictive Coding , 2018, ArXiv.
[68] Ilya Sutskever,et al. Learning Transferable Visual Models From Natural Language Supervision , 2021, ICML.
[69] Geoffrey E. Hinton,et al. A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.
[70] Kaiming He,et al. Momentum Contrast for Unsupervised Visual Representation Learning , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[71] Wen Gao,et al. Pre-Trained Image Processing Transformer , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[72] N. Codella,et al. CvT: Introducing Convolutions to Vision Transformers , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[73] Gregory Shakhnarovich,et al. Learning Representations for Automatic Colorization , 2016, ECCV.