Pic2Word: Mapping Pictures to Words for Zero-shot Composed Image Retrieval
暂无分享,去创建一个
Kate Saenko | Kihyuk Sohn | Tomas Pfister | Chen-Yu Lee | Kuniaki Saito | Xiang Zhang | Chun-Liang Li
[1] Hiroaki Hayashi,et al. Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing , 2021, ACM Comput. Surv..
[2] Yuanzhen Li,et al. DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation , 2022, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[3] Amit H. Bermano,et al. An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion , 2022, ICLR.
[4] Xiatian Zhu,et al. FashionViL: Fashion-Focused Vision-and-Language Representation Learning , 2022, ECCV.
[5] Wenhao Jiang,et al. VLMixer: Unpaired Vision-Language Pre-training via Cross-Modal CutMix , 2022, ICML.
[6] A. Bimbo,et al. Effective conditioned and composed image retrieval combining CLIP-based features , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[7] P. Natarajan,et al. FashionVLP: Vision Language Transformer for Fashion Retrieval with Feedback , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[8] Zirui Wang,et al. CoCa: Contrastive Captioners are Image-Text Foundation Models , 2022, Trans. Mach. Learn. Res..
[9] Oriol Vinyals,et al. Flamingo: a Visual Language Model for Few-Shot Learning , 2022, NeurIPS.
[10] Zeynep Akata,et al. Probabilistic Compositional Embeddings for Multimodal Image Retrieval , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).
[11] Gal Chechik,et al. "This is my unicorn, Fluffy": Personalizing frozen vision-language representations , 2022, ECCV.
[12] Rafael Sampaio de Rezende,et al. ARTEMIS: Attention-based Retrieval with Text-Explicit Matching and Implicit Similarity , 2022, 2203.08101.
[13] Li Dong,et al. CLIP Models are Few-Shot Learners: Empirical Studies on VQA and Visual Entailment , 2022, ACL.
[14] S. Hoi,et al. BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation , 2022, ICML.
[15] Marcus Rohrbach,et al. FLAVA: A Foundational Language And Vision Alignment Model , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[16] Ron Mokady,et al. ClipCap: CLIP Prefix for Image Captioning , 2021, ArXiv.
[17] Stephen Gould,et al. Image Retrieval on Real-life Images with Pre-trained Vision-and-Language Models , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[18] Junnan Li,et al. Align before Fuse: Vision and Language Representation Learning with Momentum Distillation , 2021, NeurIPS.
[19] Bohyung Han,et al. CoSMo: Content-Style Modulation for Image Retrieval with Text Feedback , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[20] Brian Lester,et al. The Power of Scale for Parameter-Efficient Prompt Tuning , 2021, EMNLP.
[21] Ilya Sutskever,et al. Learning Transferable Visual Models From Natural Language Supervision , 2021, ICML.
[22] Quoc V. Le,et al. Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision , 2021, ICML.
[23] D. Song,et al. The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization , 2020, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[24] Steven J. Rennie,et al. Fashion IQ: A New Dataset Towards Retrieving Images by Natural Language Feedback , 2019, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[25] Loris Bazzani,et al. Learning Joint Visual Semantic Matching Embeddings for Language-Guided Retrieval , 2020, ECCV.
[26] Yang Zhang,et al. Modality-Agnostic Attention Fusion for visual search with text feedback , 2020, ArXiv.
[27] Yang Wang,et al. Composed Query Image Retrieval Using Locally Bounded Features , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[28] Shaogang Gong,et al. Image Search With Text Feedback by Visiolinguistic Attention Learning , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[29] Jianfeng Gao,et al. Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks , 2020, ECCV.
[30] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.
[31] Ahmed El Kholy,et al. UNITER: Learning UNiversal Image-TExt Representations , 2019, ECCV 2020.
[32] Stefan Lee,et al. ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks , 2019, NeurIPS.
[33] Li Fei-Fei,et al. Composing Text and Image for Image Retrieval - an Empirical Odyssey , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[34] Frank Hutter,et al. Decoupled Weight Decay Regularization , 2017, ICLR.
[35] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[36] Radu Soricut,et al. Conceptual Captions: A Cleaned, Hypernymed, Image Alt-text Dataset For Automatic Image Captioning , 2018, ACL.
[37] Ling Shao,et al. Deep Sketch Hashing: Fast Free-Hand Sketch-Based Image Retrieval , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[38] Kihyuk Sohn,et al. Improved Deep Metric Learning with Multi-class N-pair Loss Objective , 2016, NIPS.
[39] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.
[40] Fei-Fei Li,et al. ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.
[41] James Ze Wang,et al. Image retrieval: Ideas, influences, and trends of the new age , 2008, CSUR.