Contextual Hashing for Large-Scale Image Search

With the explosive growth of the multimedia data on the Web, content-based image search has attracted considerable attentions in the multimedia and the computer vision community. The most popular approach is based on the bag-of-visual-words model with invariant local features. Since the spatial context information among local features is critical for visual content identification, many methods exploit the geometric clues of local features, including the location, the scale, and the orientation, for explicitly post-geometric verification. However, usually only a few initially top-ranked results are geometrically verified, considering the high computational cost in full geometric verification. In this paper, we propose to represent the spatial context of local features into binary codes, and implicitly achieve geometric verification by efficient comparison of the binary codes. Besides, we explore the multimode property of local features to further boost the retrieval performance. Experiments on holidays, Paris, and Oxford building benchmark data sets demonstrate the effectiveness of the proposed algorithm.

[1]  Svetlana Lazebnik,et al.  Iterative quantization: A procrustean approach to learning binary codes , 2011, CVPR 2011.

[2]  Cordelia Schmid,et al.  Hamming Embedding and Weak Geometric Consistency for Large Scale Image Search , 2008, ECCV.

[3]  Antonio Torralba,et al.  Small codes and large image databases for recognition , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[5]  Gary R. Bradski,et al.  ORB: An efficient alternative to SIFT or SURF , 2011, 2011 International Conference on Computer Vision.

[6]  Michael Isard,et al.  Bundling features for large scale partial-duplicate web image search , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Qi Tian,et al.  SIFT match verification by geometric coding for large-scale partial-duplicate web image search , 2013, TOMCCAP.

[8]  Michael Isard,et al.  Total Recall: Automatic Query Expansion with a Generative Feature Model for Object Retrieval , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[9]  Qi Tian,et al.  Principal Visual Word Discovery for Automatic License Plate Detection , 2012, IEEE Transactions on Image Processing.

[10]  Shiliang Zhang,et al.  Edge-SIFT: Discriminative Binary Descriptor for Scalable Partial-Duplicate Mobile Search , 2013, IEEE Transactions on Image Processing.

[11]  Jiri Matas,et al.  Robust wide-baseline stereo from maximally stable extremal regions , 2004, Image Vis. Comput..

[12]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Michael Isard,et al.  Lost in quantization: Improving particular object retrieval in large scale image databases , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[15]  Kristen Grauman,et al.  Kernelized locality-sensitive hashing for scalable image search , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[16]  Ruigang Yang,et al.  Unsupervised learning of high-order structural semantics from images , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[17]  Qi Tian,et al.  Scalar quantization for large scale image search , 2012, ACM Multimedia.

[18]  Ming Yang,et al.  Contextual weighting for vocabulary tree based image retrieval , 2011, 2011 International Conference on Computer Vision.

[19]  Harry Shum,et al.  A multi-sample, multi-tree approach to bag-of-words image representation for image retrieval , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[20]  Qi Tian,et al.  Latent visual context learning for web image applications , 2011, Pattern Recognit..

[21]  Qi Tian,et al.  Spatial coding for large scale partial-duplicate web image search , 2010, ACM Multimedia.

[22]  Qi Tian,et al.  Towards Codebook-Free: Scalable Cascaded Hashing for Mobile Image Search , 2014, IEEE Transactions on Multimedia.

[23]  Tsuhan Chen,et al.  Image retrieval with geometry-preserving visual phrases , 2011, CVPR 2011.

[24]  Svetlana Lazebnik,et al.  Locality-sensitive binary codes from shift-invariant kernels , 2009, NIPS.

[25]  Qi Tian,et al.  Lp-Norm IDF for Large Scale Image Search , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Qi Tian,et al.  Embedding spatial context information into inverted filefor large-scale image retrieval , 2012, ACM Multimedia.

[27]  Shiliang Zhang,et al.  Semantic-Aware Co-indexing for Image Retrieval , 2013, 2013 IEEE International Conference on Computer Vision.

[28]  O. Chum,et al.  Geometric min-Hashing: Finding a (thick) needle in a haystack , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Antonio Torralba,et al.  Spectral Hashing , 2008, NIPS.

[30]  Shih-Fu Chang,et al.  Sequential Projection Learning for Hashing with Compact Codes , 2010, ICML.

[31]  C. Schmid,et al.  On the burstiness of visual elements , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[32]  Cordelia Schmid,et al.  An Affine Invariant Interest Point Detector , 2002, ECCV.

[33]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[34]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[35]  Changhu Wang,et al.  Spatial-bag-of-features , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[36]  Piotr Indyk,et al.  Similarity Search in High Dimensions via Hashing , 1999, VLDB.

[37]  Gang Hua,et al.  Building contextual visual vocabulary for large-scale image applications , 2010, ACM Multimedia.

[38]  Moses Charikar,et al.  Similarity estimation techniques from rounding algorithms , 2002, STOC '02.

[39]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.