Three Towers: Flexible Contrastive Learning with Pretrained Image Models
暂无分享,去创建一个
Rodolphe Jenatton | Xiaohua Zhai | L. Beyer | Xiao Wang | Efi Kokiopoulou | Jannik Kossen | Mark Collier | Basil Mustafa | Jesse Berent | A. Steiner
[1] Yu-Gang Jiang,et al. ResFormer: Scaling ViTs with Multi-Resolution Training , 2022, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[2] Ludwig Schmidt,et al. LAION-5B: An open large-scale dataset for training next generation image-text models , 2022, NeurIPS.
[3] Ashish V. Thapliyal,et al. PaLI: A Jointly-Scaled Multilingual Language-Image Model , 2022, ICLR.
[4] Michael W. Dusenberry,et al. Plex: Towards Reliability using Pretrained Large Model Extensions , 2022, ArXiv.
[5] David J. Fleet,et al. Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding , 2022, NeurIPS.
[6] Zirui Wang,et al. CoCa: Contrastive Captioners are Image-Text Foundation Models , 2022, Trans. Mach. Learn. Res..
[7] Jianfeng Gao,et al. Unified Contrastive Learning in Image-Text-Label Space , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[8] Chen Change Loy,et al. Conditional Prompt Learning for Vision-Language Models , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[9] Ari S. Morcos,et al. Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time , 2022, ICML.
[10] Prafulla Dhariwal,et al. GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models , 2021, ICML.
[11] Daniel Keysers,et al. LiT: Zero-Shot Transfer with Locked-image text Tuning , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[12] Bing Liu,et al. Zero-Shot Out-of-Distribution Detection Based on the Pre-trained Model CLIP , 2021, AAAI.
[13] Jong Wook Kim,et al. Robust fine-tuning of zero-shot models , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[14] Chen Change Loy,et al. Learning to Prompt for Vision-Language Models , 2021, International Journal of Computer Vision.
[15] Alexander Kolesnikov,et al. Scaling Vision Transformers , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[16] Nan Duan,et al. CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval , 2021, Neurocomputing.
[17] Christopher D. Manning,et al. Contrastive Learning of Medical Visual Representations from Paired Images and Text , 2020, MLHC.
[18] Lu Yuan,et al. Florence: A New Foundation Model for Computer Vision , 2021, ArXiv.
[19] Quoc V. Le,et al. Combined Scaling for Zero-shot Transfer Learning , 2021, Neurocomputing.
[20] Xiaohua Zhai,et al. Revisiting the Calibration of Modern Neural Networks , 2021, NeurIPS.
[21] Stanislav Fort,et al. Exploring the Limits of Out-of-Distribution Detection , 2021, NeurIPS.
[22] Julien Mairal,et al. Emerging Properties in Self-Supervised Vision Transformers , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[23] Andrew Zisserman,et al. Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[24] Pieter Abbeel,et al. Putting NeRF on a Diet: Semantically Consistent Few-Shot View Synthesis , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[25] Daniel Cohen-Or,et al. StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[26] Ilya Sutskever,et al. Learning Transferable Visual Models From Natural Language Supervision , 2021, ICML.
[27] Radu Soricut,et al. Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[28] Quoc V. Le,et al. Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision , 2021, ICML.
[29] Yuning Jiang,et al. Learning the Best Pooling Strategy for Visual Semantic Embedding , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[30] S. Gelly,et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2020, ICLR.
[31] D. Song,et al. The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization , 2020, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[32] Dawn Song,et al. Natural Adversarial Examples , 2019, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[33] Julien Perez,et al. Learning Visual Representations with Caption Annotations , 2020, ECCV.
[34] Pierre H. Richemond,et al. Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning , 2020, NeurIPS.
[35] Geoffrey E. Hinton,et al. A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.
[36] S. Gelly,et al. Big Transfer (BiT): General Visual Representation Learning , 2019, ECCV.
[37] Ross B. Girshick,et al. Momentum Contrast for Unsupervised Visual Representation Learning , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[38] Yonglong Tian,et al. Contrastive Representation Distillation , 2019, ICLR.
[39] André Susano Pinto,et al. A Large-scale Study of Representation Learning with the Visual Task Adaptation Benchmark , 2019, 1910.04867.
[40] Benjamin Recht,et al. Do ImageNet Classifiers Generalize to ImageNet? , 2019, ICML.
[41] Thomas G. Dietterich,et al. Benchmarking Neural Network Robustness to Common Corruptions and Perturbations , 2018, ICLR.
[42] Andreas Dengel,et al. EuroSAT: A Novel Dataset and Deep Learning Benchmark for Land Use and Land Cover Classification , 2017, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.
[43] Boris Katz,et al. ObjectNet: A large-scale bias-controlled dataset for pushing the limits of object recognition models , 2019, NeurIPS.
[44] Taku Kudo,et al. SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing , 2018, EMNLP.
[45] Oriol Vinyals,et al. Representation Learning with Contrastive Predictive Coding , 2018, ArXiv.
[46] Stella X. Yu,et al. Unsupervised Feature Learning via Non-parametric Instance Discrimination , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[47] Bolei Zhou,et al. Places: A 10 Million Image Database for Scene Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[48] Kaiming He,et al. Exploring the Limits of Weakly Supervised Pretraining , 2018, ECCV.
[49] Noam Shazeer,et al. Adafactor: Adaptive Learning Rates with Sublinear Memory Cost , 2018, ICML.
[50] David J. Fleet,et al. VSE++: Improving Visual-Semantic Embeddings with Hard Negatives , 2017, BMVC.
[51] Chen Sun,et al. Revisiting Unreasonable Effectiveness of Data in Deep Learning Era , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[52] Xiaoqiang Lu,et al. Remote Sensing Image Scene Classification: Benchmark and State of the Art , 2017, Proceedings of the IEEE.
[53] Allan Jabri,et al. Learning Visual N-Grams from Web Data , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).
[54] Kevin Gimpel,et al. A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks , 2016, ICLR.
[55] Fei-Fei Li,et al. Deep visual-semantic alignments for generating image descriptions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[56] Kihyuk Sohn,et al. Improved Deep Metric Learning with Multi-class N-pair Loss Objective , 2016, NIPS.
[57] Kevin Gimpel,et al. Gaussian Error Linear Units (GELUs) , 2016, 1606.08415.
[58] Francesco Bianconi,et al. Multi-class texture analysis in colorectal cancer histology , 2016, Scientific Reports.
[59] Allan Jabri,et al. Learning Visual Features from Large Weakly Supervised Data , 2015, ECCV.
[60] David A. Shamma,et al. YFCC100M , 2015, Commun. ACM.
[61] Svetlana Lazebnik,et al. Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models , 2015, International Journal of Computer Vision.
[62] Alexei A. Efros,et al. Unsupervised Visual Representation Learning by Context Prediction , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[63] Xinlei Chen,et al. Microsoft COCO Captions: Data Collection and Evaluation Server , 2015, ArXiv.
[64] Milos Hauskrecht,et al. Obtaining Well Calibrated Probabilities Using Bayesian Binning , 2015, AAAI.
[65] Yoshua Bengio,et al. FitNets: Hints for Thin Deep Nets , 2014, ICLR.
[66] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.
[67] Krista A. Ehinger,et al. SUN Database: Exploring a Large Collection of Scene Categories , 2014, International Journal of Computer Vision.
[68] Iasonas Kokkinos,et al. Describing Textures in the Wild , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[69] Marc'Aurelio Ranzato,et al. DeViSE: A Deep Visual-Semantic Embedding Model , 2013, NIPS.
[70] Jonathan Krause,et al. 3D Object Representations for Fine-Grained Categorization , 2013, 2013 IEEE International Conference on Computer Vision Workshops.
[71] Nitish Srivastava,et al. Multimodal learning with deep Boltzmann machines , 2012, J. Mach. Learn. Res..
[72] C. V. Jawahar,et al. Cats and dogs , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.
[73] Shawn D. Newsam,et al. Bag-of-visual-words and spatial extensions for land-use classification , 2010, GIS '10.
[74] Qiang Yang,et al. A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.
[75] Fei-Fei Li,et al. ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.
[76] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .
[77] Katja Markert,et al. Learning Models for Object Recognition from Natural Language Descriptions , 2009, BMVC.
[78] Andrew Zisserman,et al. Automated Flower Classification over a Large Number of Classes , 2008, 2008 Sixth Indian Conference on Computer Vision, Graphics & Image Processing.
[79] Trevor Darrell,et al. Learning Visual Representations using Images with Captions , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.
[80] G. Griffin,et al. Caltech-256 Object Category Dataset , 2007 .
[81] A. Raftery,et al. Strictly Proper Scoring Rules, Prediction, and Estimation , 2007 .
[82] Y. Mori,et al. Image-to-word transformation based on dividing and vector quantizing images with words , 1999 .