论文信息 - Spatial-Content Image Search in Complex Scenes

Spatial-Content Image Search in Complex Scenes

Although the topic of image search has been heavily studied in the last two decades, many works have focused on either instance-level retrieval or semantic-level retrieval. In this work, we develop a novel visually similar spatial-semantic method, namely spatial-content image search, to search images that not only share the same spatial-semantics but also enjoy visual consistency as the query image in complex scenes. We achieve the goal by capturing spatial-semantic concepts as well as the visual representation of each concept contained in an image. Specifically, we first generate a set of bounding boxes and their category labels representing spatial-semantic constraints with YOLOV3, and then obtain visual content of each bounding box with deep features extracted from a convolutional neural network. After that, we customize a similarity computation method that evaluates the relevance between dataset images and input queries according to the developed image representations. Experimental results on two large-scale benchmark retrieval datasets with images consisting of multiple objects demonstrate that our method provides an effective way to query image databases. Our code is available at https://github.com/MaJinWakeUp/spatial-content.

[1] Karl Stratos,et al. Large Scale Retrieval and Generation of Image Descriptions , 2015, International Journal of Computer Vision.

[2] Ross B. Girshick,et al. Mask R-CNN , 2017, 1703.06870.

[3] Michael S. Bernstein,et al. Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations , 2016, International Journal of Computer Vision.

[4] Hao Xu,et al. Image search by concept map , 2010, SIGIR '10.

[5] Ronan Sicre,et al. Particular object retrieval with integral max-pooling of CNN activations , 2015, ICLR.

[6] A. Fitzgibbon,et al. Learning query-dependent prefilters for scalable image retrieval , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[7] Andrew W. Fitzgibbon,et al. PiCoDes: Learning a Compact Code for Novel-Category Recognition , 2011, NIPS.

[8] C. V. Jawahar,et al. Self-Supervised Learning of Visual Features through Embedding Images into Text Topic Spaces , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9] Ross B. Girshick,et al. Fast R-CNN , 2015, 1504.08083.

[10] Kamelia Aryafar,et al. Images Don't Lie: Transferring Deep Visual Semantic Features to Large-Scale Multimodal Learning to Rank , 2015, KDD.

[11] Ali Farhadi,et al. YOLOv3: An Incremental Improvement , 2018, ArXiv.

[12] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13] Albert Gordo,et al. Beyond Instance-Level Image Retrieval: Leveraging Captions to Learn a Global Visual Representation for Semantic Retrieval , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14] Shih-Fu Chang,et al. Attributes and categories for generic instance search from one example , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15] Wei Liu,et al. SSD: Single Shot MultiBox Detector , 2015, ECCV.

[16] Cordelia Schmid,et al. Aggregating Local Image Descriptors into Compact Codes , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17] Florent Perronnin,et al. Large-scale image retrieval with compressed Fisher vectors , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[18] Jianru Xue,et al. Deep Feature Aggregation and Image Re-Ranking With Heat Diffusion for Image Retrieval , 2018, IEEE Transactions on Multimedia.

[19] I. Biederman. Recognition-by-components: a theory of human image understanding. , 1987, Psychological review.

[20] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.

[21] Andrew Zisserman,et al. Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[22] Hailin Jin,et al. Spatial-Semantic Image Search by Visual Feature Synthesis , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23] David Stutz,et al. Neural Codes for Image Retrieval , 2015 .

[24] Michael Isard,et al. Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[25] Victor S. Lempitsky,et al. Aggregating Local Deep Features for Image Retrieval , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[26] Ali Farhadi,et al. You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27] Marc'Aurelio Ranzato,et al. DeViSE: A Deep Visual-Semantic Embedding Model , 2013, NIPS.

[28] Trevor Darrell,et al. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[29] Kerry Rodden,et al. How do people manage their digital photographs? , 2003, CHI '03.

[30] Li Fei-Fei,et al. Recurrent Attention Models for Depth-Based Person Identification , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31] Xiang Zhang,et al. OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks , 2013, ICLR.