Learning Customized Visual Models with Retrieval-Augmented Knowledge
暂无分享,去创建一个
[1] Ludwig Schmidt,et al. LAION-5B: An open large-scale dataset for training next generation image-text models , 2022, NeurIPS.
[2] William W. Cohen,et al. MuRAG: Multimodal Retrieval-Augmented Generator for Open Question Answering over Images and Text , 2022, EMNLP.
[3] William W. Cohen,et al. Re-Imagen: Retrieval-Augmented Text-to-Image Generator , 2022, ICLR.
[4] Seung Wook Kim,et al. UniCLIP: Unified Framework for Contrastive Language-Image Pre-training , 2022, NeurIPS.
[5] Ashish V. Thapliyal,et al. PaLI: A Jointly-Scaled Multilingual Language-Image Model , 2022, arXiv.org.
[6] Fang Wen,et al. MaskCLIP: Masked Self-Distillation Advances Contrastive Language-Image Pretraining , 2022, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[7] Z. Tu,et al. Semi-supervised Vision Transformers at Scale , 2022, NeurIPS.
[8] N. Codella,et al. Learning Visual Representation from Modality-Shared Contrastive Language-Image Pre-training , 2022, ECCV.
[9] Rodolphe Jenatton,et al. Multimodal Contrastive Learning with LIMoE: the Language-Image Mixture of Experts , 2022, NeurIPS.
[10] X. Zhang,et al. Prefix Conditioning Unifies Language and Label Supervision , 2022, ArXiv.
[11] S. Levine,et al. Multimodal Masked Autoencoders Learn Transferable Representations , 2022, ArXiv.
[12] Zirui Wang,et al. CoCa: Contrastive Captioners are Image-Text Foundation Models , 2022, Trans. Mach. Learn. Res..
[13] Oriol Vinyals,et al. Flamingo: a Visual Language Model for Few-Shot Learning , 2022, NeurIPS.
[14] Chunhua Shen,et al. PyramidCLIP: Hierarchical Feature Alignment for Vision-language Model Pretraining , 2022, NeurIPS.
[15] Trevor Darrell,et al. K-LITE: Learning Transferable Visual Models with External Knowledge , 2022, NeurIPS.
[16] Yong Jae Lee,et al. ELEVATER: A Benchmark and Toolkit for Evaluating Language-Augmented Visual Models , 2022, NeurIPS.
[17] Michael G. Rabbat,et al. Masked Siamese Networks for Label-Efficient Learning , 2022, ECCV.
[18] Jianfeng Gao,et al. Unified Contrastive Learning in Image-Text-Label Space , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[19] Yaniv Taigman,et al. KNN-Diffusion: Image Generation via Large-Scale Retrieval , 2022, ICLR.
[20] Jianfeng Gao,et al. Focal Modulation Networks , 2022, NeurIPS.
[21] Chunhua Shen,et al. Retrieval Augmented Classification for Long-Tail Visual Recognition , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[22] Trishul M. Chilimbi,et al. Vision-Language Pre-Training with Triple Contrastive Learning , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[23] Saining Xie,et al. SLIP: Self-supervision meets Language-Image Pre-training , 2021, ECCV.
[24] Lu Yuan,et al. RegionCLIP: Region-based Language-Image Pretraining , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[25] Diego de Las Casas,et al. Improving language models by retrieving from trillions of tokens , 2021, ICML.
[26] Chen Change Loy,et al. Extract Free Dense Labels from CLIP , 2021, ECCV.
[27] Quoc V. Le,et al. Combined Scaling for Open-Vocabulary Image Classification , 2022 .
[28] Daniel Keysers,et al. LiT: Zero-Shot Transfer with Locked-image text Tuning , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[29] Ross B. Girshick,et al. Masked Autoencoders Are Scalable Vision Learners , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[30] Zhenguo Li,et al. FILIP: Fine-grained Interactive Language-Image Pre-Training , 2021, ICLR.
[31] Shuohang Wang,et al. Dict-BERT: Enhancing Language Model Pre-training with Dictionary , 2021, Findings.
[32] Junjie Yan,et al. Supervision Exists Everywhere: A Data Efficient Contrastive Language-Image Pre-training Paradigm , 2021, ICLR.
[33] Zhe Gan,et al. An Empirical Study of GPT-3 for Few-Shot Knowledge-Based VQA , 2021, AAAI.
[34] Chen Change Loy,et al. Learning to Prompt for Vision-Language Models , 2021, International Journal of Computer Vision.
[35] Hannaneh Hajishirzi,et al. Cross-Task Generalization via Natural Language Crowdsourcing Instructions , 2021, ACL.
[36] Roozbeh Mottaghi,et al. Multi-Modal Answer Validation for Knowledge-Based VQA , 2021, AAAI.
[37] M. Lewis,et al. Retrieval-Augmented Multimodal Language Modeling , 2022, ArXiv.
[38] B. Ommer,et al. Retrieval-Augmented Diffusion Models , 2022, NeurIPS.
[39] Noah A. Smith,et al. Benchmarking Generalization via In-Context Instructions on 1, 600+ Language Tasks , 2022, ArXiv.
[40] Lu Yuan,et al. Florence: A New Foundation Model for Computer Vision , 2021, ArXiv.
[41] Tao Kong,et al. iBOT: Image BERT Pre-Training with Online Tokenizer , 2021, ArXiv.
[42] Jenia Jitsev,et al. LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs , 2021, ArXiv.
[43] Jason Baldridge,et al. MURAL: Multimodal, Multitask Retrieval Across Languages , 2021, ArXiv.
[44] Oriol Vinyals,et al. Multimodal Few-Shot Learning with Frozen Language Models , 2021, NeurIPS.
[45] Julien Mairal,et al. Emerging Properties in Self-Supervised Vision Transformers , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[46] Armand Joulin,et al. Semi-Supervised Learning of Visual Features by Non-Parametrically Predicting View Assignments with Support Samples , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[47] Armand Joulin,et al. Self-supervised Pretraining of Visual Features in the Wild , 2021, ArXiv.
[48] Ilya Sutskever,et al. Learning Transferable Visual Models From Natural Language Supervision , 2021, ICML.
[49] Radu Soricut,et al. Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[50] Quoc V. Le,et al. Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision , 2021, ICML.
[51] Marcus Rohrbach,et al. KRISP: Integrating Implicit and Symbolic Knowledge for Open-Domain Knowledge-Based VQA , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[52] Yuning Jiang,et al. Learning the Best Pooling Strategy for Visual Semantic Embedding , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[53] S. Gelly,et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2020, ICLR.
[54] D. Song,et al. The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization , 2020, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[55] Jeff Johnson,et al. Billion-Scale Similarity Search with GPUs , 2017, IEEE Transactions on Big Data.
[56] Geoffrey E. Hinton,et al. Big Self-Supervised Models are Strong Semi-Supervised Learners , 2020, NeurIPS.
[57] Geoffrey E. Hinton,et al. A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.
[58] Ming-Wei Chang,et al. REALM: Retrieval-Augmented Language Model Pre-Training , 2020, ICML.
[59] Lin Su,et al. ImageBERT: Cross-modal Pre-training with Large-scale Weak-supervised Image-Text Data , 2020, ArXiv.
[60] S. Gelly,et al. Big Transfer (BiT): General Visual Representation Learning , 2019, ECCV.
[61] Ross B. Girshick,et al. Momentum Contrast for Unsupervised Visual Representation Learning , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[62] Quoc V. Le,et al. Self-Training With Noisy Student Improves ImageNet Classification , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[63] Omer Levy,et al. Generalization through Memorization: Nearest Neighbor Language Models , 2019, ICLR.
[64] Zhe Zhao,et al. K-BERT: Enabling Language Representation with Knowledge Graph , 2019, AAAI.
[65] Yury A. Malkov,et al. Efficient and Robust Approximate Nearest Neighbor Search Using Hierarchical Navigable Small World Graphs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[66] Noah A. Smith,et al. Knowledge Enhanced Contextual Word Representations , 2019, EMNLP.
[67] Abhinav Gupta,et al. Scaling and Benchmarking Self-Supervised Visual Representation Learning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[68] Eric P. Xing,et al. Learning Robust Global Representations by Penalizing Local Predictive Power , 2019, NeurIPS.
[69] Quoc V. Le,et al. Do Better ImageNet Models Transfer Better? , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[70] Christoph H. Lampert,et al. Zero-Shot Learning—A Comprehensive Evaluation of the Good, the Bad and the Ugly , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[71] Radu Soricut,et al. Conceptual Captions: A Cleaned, Hypernymed, Image Alt-text Dataset For Automatic Image Captioning , 2018, ACL.
[72] Max Welling,et al. Rotation Equivariant CNNs for Digital Pathology , 2018, MICCAI.
[73] Chen Sun,et al. Revisiting Unreasonable Effectiveness of Data in Deep Learning Era , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[74] David A. Shamma,et al. YFCC100M , 2015, Commun. ACM.
[75] Svetlana Lazebnik,et al. Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models , 2015, International Journal of Computer Vision.
[76] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[77] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.
[78] Peter Young,et al. From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions , 2014, TACL.
[79] Jonathan Krause,et al. 3D Object Representations for Fine-Grained Categorization , 2013, 2013 IEEE International Conference on Computer Vision Workshops.
[80] Subhransu Maji,et al. Fine-Grained Visual Classification of Aircraft , 2013, ArXiv.
[81] Iryna Gurevych,et al. Wiktionary: a new rival for expert-built lexicons? Exploring the possibilities of collaborative lexicography , 2012 .
[82] Fei-Fei Li,et al. ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.
[83] Christiane Fellbaum,et al. Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.
[84] H. J. Scudder,et al. Probability of error of some adaptive pattern-recognition machines , 1965, IEEE Trans. Inf. Theory.