Efficient and interactive spatial-semantic image retrieval

This paper proposes an efficient image retrieval system. When users wish to retrieve images with semantic and spatial constraints (e.g., a horse is located at the center of the image, and a person is riding on the horse), it is difficult for conventional text-based retrieval systems to retrieve such images exactly. In contrast, the proposed system can consider both semantic and spatial information, because it is based on semantic segmentation using fully convolutional networks (FCN). The proposed system can accept three types of images as queries: a segmentation map sketched by the user, a natural image, or a combination of the two. The distance between the query and each image in the database is calculated based on the output probability maps from the FCN. In order to make the system efficient in terms of both the computational time and memory usage, we employ the product quantization (PQ) technique. The experimental results show that the PQ is compatible with the FCN-based image retrieval system, and that the quantization process results in little information loss. It is also shown that our method outperforms a conventional text-based search system.

[1]  Chun Chen,et al.  Scalable Image Retrieval by Sparse Product Quantization , 2016, IEEE Transactions on Multimedia.

[2]  James Hays,et al.  The sketchy database , 2016, ACM Trans. Graph..

[3]  Kiyoharu Aizawa,et al.  PQTable: Nonexhaustive Fast Search for Product-Quantized Codes Using Hash Tables , 2018, IEEE Transactions on Multimedia.

[4]  David J. Fleet,et al.  Cartesian K-Means , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Kiyoharu Aizawa,et al.  PQTable: Fast Exact Asymmetric Distance Neighbor Search for Product Quantization Using Hash Tables , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[6]  Honggang Zhang,et al.  Sketch-based image retrieval via Siamese convolutional neural network , 2016, 2016 IEEE International Conference on Image Processing (ICIP).

[7]  Kiyoharu Aizawa,et al.  Object detection refinement using Markov random field based pruning and learning based rescoring , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[8]  Dong Wang,et al.  Robust semantic sketch based specific image retrieval , 2010, 2010 IEEE International Conference on Multimedia and Expo.

[9]  Hao Xu,et al.  Image search by concept map , 2010, SIGIR '10.

[10]  Yilong Yin,et al.  Content-based image retrieval via a hierarchical-local-feature extraction scheme , 2018, Multimedia Tools and Applications.

[11]  Nanning Zheng,et al.  Online Variable Coding Length Product Quantization for Fast Nearest Neighbor Search in Mobile Retrieval , 2017, IEEE Transactions on Multimedia.

[12]  Karl Stratos,et al.  Large Scale Retrieval and Generation of Image Descriptions , 2015, International Journal of Computer Vision.

[13]  Shin'ichi Satoh,et al.  Large-Scale R-CNN with Classifier Adaptive Quantization , 2016, ECCV.

[14]  Feng Liu,et al.  Sketch Me That Shoe , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Liqing Zhang,et al.  MindFinder: interactive sketch-based image search on millions of images , 2010, ACM Multimedia.

[16]  Gunhee Kim,et al.  Ranking and retrieval of image sequences from multiple paragraph queries , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Albert Gordo,et al.  Beyond Instance-Level Image Retrieval: Leveraging Captions to Learn a Global Visual Representation for Semantic Retrieval , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Jian Sun,et al.  Optimized Product Quantization for Approximate Nearest Neighbor Search , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Sabine Süsstrunk,et al.  Webly Supervised Semantic Segmentation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  R. Venkatesh Babu,et al.  Attribute-Graph: A Graph Based Approach to Image Ranking , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[21]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[22]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[23]  Peter Wonka,et al.  RAID: A Relation-Augmented Image Descriptor , 2016, ACM Trans. Graph..

[24]  Seunghoon Hong,et al.  Weakly Supervised Semantic Segmentation Using Web-Crawled Videos , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Sanja Fidler,et al.  The Role of Context for Object Detection and Semantic Segmentation in the Wild , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Yannis Avrithis,et al.  Locally Optimized Product Quantization for Approximate Nearest Neighbor Search , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Jing Pan,et al.  Relevance and irrelevance graph based marginal Fisher analysis for image search reranking , 2016, Signal Process..

[28]  Hailin Jin,et al.  Spatial-Semantic Image Search by Visual Feature Synthesis , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Albert Gordo,et al.  Deep Image Retrieval: Learning Global Representations for Image Search , 2016, ECCV.

[30]  Michael S. Bernstein,et al.  Image retrieval using scene graphs , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Xuelong Li,et al.  Relevance Preserving Projection and Ranking for Web Image Search Reranking , 2015, IEEE Transactions on Image Processing.

[32]  Cordelia Schmid,et al.  Combining attributes and Fisher vectors for efficient image retrieval , 2011, CVPR 2011.

[33]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  Shin'ichi Satoh,et al.  Region-Based Image Retrieval Revisited , 2017, ACM Multimedia.

[35]  Nicu Sebe,et al.  A Survey on Learning to Hash , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Tao Xiang,et al.  Deep Spatial-Semantic Attention for Fine-Grained Sketch-Based Image Retrieval , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[37]  Li Fei-Fei,et al.  DenseCap: Fully Convolutional Localization Networks for Dense Captioning , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Fei-Fei Li,et al.  Deep visual-semantic alignments for generating image descriptions , 2015, CVPR.

[39]  Ling Shao,et al.  Deep Sketch Hashing: Fast Free-Hand Sketch-Based Image Retrieval , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Xiaogang Wang,et al.  Pyramid Scene Parsing Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Fang Wang,et al.  Sketch-based 3D shape retrieval using Convolutional Neural Networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Toshihiko Yamasaki,et al.  Efficient and Interactive Spatial-Semantic Image Retrieval , 2018, MMM.

[43]  Litao Yu,et al.  Bilinear Optimized Product Quantization for Scalable Visual Content Analysis. , 2017, IEEE transactions on image processing : a publication of the IEEE Signal Processing Society.

[44]  Xiaochun Cao,et al.  Augmented Image Retrieval using Multi-order Object Layout with Attributes , 2014, ACM Multimedia.

[45]  Christoph H. Lampert,et al.  Seed, Expand and Constrain: Three Principles for Weakly-Supervised Image Segmentation , 2016, ECCV.

[46]  Cordelia Schmid,et al.  Product Quantization for Nearest Neighbor Search , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[47]  Victor Lempitsky,et al.  Additive Quantization for Extreme Vector Compression , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.