Telling the What while Pointing to the Where: Multimodal Queries for Image Retrieval
暂无分享,去创建一个
Jordi Pont-Tuset | Radu Soricut | Vittorio Ferrari | Soravit Changpinyo | V. Ferrari | Radu Soricut | J. Pont-Tuset | Soravit Changpinyo
[1] Andrew W. Fitzgibbon,et al. Efficient Object Category Recognition Using Classemes , 2010, ECCV.
[2] Yang Song,et al. Learning Fine-Grained Image Similarity with Deep Ranking , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[3] Sergio Guadarrama,et al. Speed/Accuracy Trade-Offs for Modern Convolutional Object Detectors , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[4] Andrea Esuli,et al. Transformer Reasoning Network for Image- Text Matching and Retrieval , 2021, 2020 25th International Conference on Pattern Recognition (ICPR).
[5] Lior Wolf,et al. Associating neural word embeddings with deep image representations using Fisher Vectors , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[6] H. H. Clark. Coordinating with each other in a material world , 2005 .
[7] Andrew Zisserman,et al. Efficient On-the-fly Category Retrieval Using ConvNets and GPUs , 2014, ACCV.
[8] Aviv Eisenschtat,et al. Linking Image and Text with 2-Way Nets , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[9] Xiaogang Wang,et al. Scene Graph Generation from Objects, Phrases and Region Captions , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[10] Quoc V. Le,et al. Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision , 2021, ICML.
[11] Huchuan Lu,et al. Similarity Reasoning and Filtration for Image-Text Matching , 2021, AAAI.
[12] Jordi Pont-Tuset,et al. Connecting Vision and Language with Localized Narratives , 2019, ECCV.
[13] Kate Saenko,et al. Language Features Matter: Effective Language Representations for Vision-Language Tasks , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[14] Hexiang Hu,et al. Learning the Best Pooling Strategy for Visual Semantic Embedding , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[15] Svetlana Lazebnik,et al. Improving Image-Sentence Embeddings Using Large Weakly Annotated Photo Collections , 2014, ECCV.
[16] Qi Tian,et al. Recent Advance in Content-based Image Retrieval: A Literature Survey , 2017, ArXiv.
[17] Lei Zhang,et al. Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[18] Danfei Xu,et al. Scene Graph Generation by Iterative Message Passing , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[19] Yejin Choi,et al. Neural Motifs: Scene Graph Parsing with Global Context , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[20] Andrea Esuli,et al. Fine-Grained Visual Textual Alignment for Cross-Modal Retrieval Using Transformer Encoders , 2020, ACM Trans. Multim. Comput. Commun. Appl..
[21] Heng Tao Shen,et al. Universal Weighting Metric Learning for Cross-Modal Matching , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[22] Qi Tian,et al. SIFT Meets CNN: A Decade Survey of Instance Retrieval , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[23] Fei-Fei Li,et al. Deep visual-semantic alignments for generating image descriptions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[24] Toshihiko Yamasaki,et al. Efficient and Interactive Spatial-Semantic Image Retrieval , 2018, MMM.
[25] B. S. Manjunath,et al. Category-based image retrieval , 2001, Proceedings 2001 International Conference on Image Processing (Cat. No.01CH37205).
[26] Lin Su,et al. ImageBERT: Cross-modal Pre-training with Large-scale Weak-supervised Image-Text Data , 2020, ArXiv.
[27] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.
[28] Ilya Sutskever,et al. Learning Transferable Visual Models From Natural Language Supervision , 2021, ICML.
[29] Radu Soricut,et al. Conceptual Captions: A Cleaned, Hypernymed, Image Alt-text Dataset For Automatic Image Captioning , 2018, ACL.
[30] Mert Kilickaya,et al. Structured Visual Search via Composition-aware Learning , 2021, 2021 IEEE Winter Conference on Applications of Computer Vision (WACV).
[31] Wei Chen,et al. Deep Image Retrieval: A Survey , 2021, ArXiv.
[32] Michael S. Bernstein,et al. Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations , 2016, International Journal of Computer Vision.
[33] Jiebo Luo,et al. Adaptive Offline Quintuplet Loss for Image-Text Matching , 2020, ECCV.
[34] Yu Cheng,et al. UNITER: UNiversal Image-TExt Representation Learning , 2019, ECCV.
[35] Leo Sampaio Ferraz Ribeiro,et al. Sketching out the details: Sketch-based image retrieval using convolutional neural networks with multi-stage regression , 2018, Comput. Graph..
[36] Tao Xiang,et al. Deep Spatial-Semantic Attention for Fine-Grained Sketch-Based Image Retrieval , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[37] Huchuan Lu,et al. Deep Cross-Modal Projection Learning for Image-Text Matching , 2018, ECCV.
[38] David J. Fleet,et al. VSE++: Improving Visual-Semantic Embeddings with Hard Negatives , 2017, BMVC.
[39] Ruslan Salakhutdinov,et al. Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models , 2014, ArXiv.
[40] Heiko Schuldt,et al. Query by Semantic Sketch , 2019, ArXiv.
[41] Xi Chen,et al. Stacked Cross Attention for Image-Text Matching , 2018, ECCV.
[42] Fei Sha,et al. Learning to Represent Image and Text with Denotation Graphs , 2020, EMNLP.
[43] B. Scholl,et al. “Please Tap the Shape, Anywhere You Like” , 2014, Psychological science.
[44] Li Fei-Fei,et al. Composing Text and Image for Image Retrieval - an Empirical Odyssey , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[45] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[46] Albert Gordo,et al. Deep Image Retrieval: Learning Global Representations for Image Search , 2016, ECCV.
[47] Michael S. Bernstein,et al. Image retrieval using scene graphs , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[48] Li Fei-Fei,et al. Generating Semantically Precise Scene Graphs from Textual Descriptions for Improved Image Retrieval , 2015, VL@EMNLP.
[49] Xiaogang Wang,et al. CAMP: Cross-Modal Adaptive Message Passing for Text-Image Retrieval , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[50] Marcus Rohrbach,et al. 12-in-1: Multi-Task Vision and Language Representation Learning , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[51] Svetlana Lazebnik,et al. Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[52] Xilin Chen,et al. Cross-modal Scene Graph Matching for Relationship-aware Image-Text Retrieval , 2019, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).
[53] Xinlei Chen,et al. Microsoft COCO Captions: Data Collection and Evaluation Server , 2015, ArXiv.
[54] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[55] Rogério Schmidt Feris,et al. Dialog-based Interactive Image Retrieval , 2018, NeurIPS.
[56] Radu Soricut,et al. Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[57] Bohyung Han,et al. Large-Scale Image Retrieval with Attentive Deep Local Features , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).
[58] Jean-Baptiste Alayrac,et al. Decoupling the Role of Data, Attention, and Losses in Multimodal Transformers , 2021, Transactions of the Association for Computational Linguistics.
[59] Hailin Jin,et al. Spatial-Semantic Image Search by Visual Feature Synthesis , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[60] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[61] Yin Li,et al. Learning Deep Structure-Preserving Image-Text Embeddings , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[62] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[63] Lin Ma,et al. Multimodal Convolutional Neural Networks for Matching Image and Sentence , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[64] Nan Duan,et al. Unicoder-VL: A Universal Encoder for Vision and Language by Cross-modal Pre-training , 2019, AAAI.
[65] Yun Fu,et al. Visual Semantic Reasoning for Image-Text Matching , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[66] Shin'ichi Satoh,et al. Region-Based Image Retrieval Revisited , 2017, ACM Multimedia.
[67] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[68] Stefan Lee,et al. ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks , 2019, NeurIPS.
[69] Ling Shao,et al. Deep Sketch Hashing: Fast Free-Hand Sketch-Based Image Retrieval , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[70] Peter Young,et al. From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions , 2014, TACL.
[71] Ji Liu,et al. IMRAM: Iterative Matching With Recurrent Attention Memory for Cross-Modal Image-Text Retrieval , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[72] Shaogang Gong,et al. Image Search With Text Feedback by Visiolinguistic Attention Learning , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[73] Bernt Schiele,et al. Scalable Nonlinear Embeddings for Semantic Category-Based Image Retrieval , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[74] Tiancheng Zhao,et al. VisualSparta: Sparse Transformer Fragment-level Matching for Large-scale Text-to-Image Search , 2021, ArXiv.
[75] Larry S. Davis,et al. Generating Holistic 3D Scene Abstractions for Text-Based Image Retrieval , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[76] Jung-Woo Ha,et al. Dual Attention Networks for Multimodal Reasoning and Matching , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[77] Ling Shao,et al. Generative Domain-Migration Hashing for Sketch-to-Image Retrieval , 2018, ECCV.
[78] Giorgos Tolias,et al. Fine-Tuning CNN Image Retrieval with No Human Annotation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.