A Prior-Free Weighting Scheme for Binary Code Ranking

Fast similarity search has been a research focus in recent years. Binary hashing, which embeds high-dimensional data points into Hamming space, is a promising way to accelerate similarity search, since its search process can be performed in real-time by using Hamming distance as similarity metric. However, as Hamming distance is discrete and bounded by code length, its resolution is limited. In practice, there are often many results sharing the same Hamming distance to a query, which poses a critical issue for problems where ranking is important. This paper proposes a weighted Hamming distance ranking algorithm (WhRank) to give a better ranking of results with equal Hamming distances to a query. By assigning different bit-level weights to different bits, WhRank is able to distinguish between the relative importance of different bits, and to rank the results at a finer-grained hash code level rather than the original integer Hamming distance level. We show that an effective weight is not only data-adaptive but also query-sensitive, and give a simple yet effective prior-free weight learning algorithm. Evaluations on three large-scale image datasets containing up to one million points demonstrate the efficacy of the proposed algorithm.

[1]  Shih-Fu Chang,et al.  Lost in binarization: query-adaptive ranking for similar image search with compact codes , 2011, ICMR '11.

[2]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[3]  Jon Louis Bentley,et al.  K-d trees for semidynamic point sets , 1990, SCG '90.

[4]  Geoffrey E. Hinton,et al.  Semantic hashing , 2009, Int. J. Approx. Reason..

[5]  Xiao Zhang,et al.  QsRank: Query-sensitive hash code ranking for efficient ∊-neighbor search , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Rongrong Ji,et al.  Supervised hashing with kernels , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Zhe Wang,et al.  Multi-Probe LSH: Efficient Indexing for High-Dimensional Similarity Search , 2007, VLDB.

[8]  Meng Wang,et al.  Spectral Hashing With Semantically Consistent Graph for Image Indexing , 2013, IEEE Transactions on Multimedia.

[9]  Qi Tian,et al.  Spatial coding for large scale partial-duplicate web image search , 2010, ACM Multimedia.

[10]  Piotr Indyk,et al.  Similarity Search in High Dimensions via Hashing , 1999, VLDB.

[11]  Benno Stein Principles of hash-based text retrieval , 2007, SIGIR.

[12]  Wei-Ying Ma,et al.  AnnoSearch: Image Auto-Annotation by Search , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[13]  Yongdong Zhang,et al.  Topology preserving hashing for similarity search , 2013, MM '13.

[14]  Shih-Fu Chang,et al.  Semi-Supervised Hashing for Large-Scale Search , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[16]  Antonio Torralba,et al.  Multidimensional Spectral Hashing , 2012, ECCV.

[17]  Trevor Darrell,et al.  Learning to Hash with Binary Reconstructive Embeddings , 2009, NIPS.

[18]  Qi Tian,et al.  Super-Bit Locality-Sensitive Hashing , 2012, NIPS.

[19]  David J. Fleet,et al.  Minimal Loss Hashing for Compact Binary Codes , 2011, ICML.

[20]  Pascal Fua,et al.  LDAHash: Improved Matching with Smaller Descriptors , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Kristen Grauman,et al.  Kernelized Locality-Sensitive Hashing , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Alexandr Andoni,et al.  Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[23]  Hans-Jörg Schek,et al.  A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces , 1998, VLDB.

[24]  Svetlana Lazebnik,et al.  Locality-sensitive binary codes from shift-invariant kernels , 2009, NIPS.

[25]  Yi Zhen,et al.  A probabilistic model for multimodal hash function learning , 2012, KDD.

[26]  Antonio Torralba,et al.  Spectral Hashing , 2008, NIPS.

[27]  Shih-Fu Chang,et al.  Sequential Projection Learning for Hashing with Compact Codes , 2010, ICML.

[28]  Guosheng Lin,et al.  Learning Hash Functions Using Column Generation , 2013, ICML.

[29]  Richard I. Hartley,et al.  Optimised KD-trees for fast image descriptor matching , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  Yongdong Zhang,et al.  Binary Code Ranking with Weighted Hamming Distance , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  Shih-Fu Chang,et al.  Query-Adaptive Image Search With Hash Codes , 2013, IEEE Transactions on Multimedia.

[32]  Svetlana Lazebnik,et al.  Iterative quantization: A procrustean approach to learning binary codes , 2011, CVPR 2011.

[33]  Qi Tian,et al.  Scalar quantization for large scale image search , 2012, ACM Multimedia.

[34]  Cordelia Schmid,et al.  Product Quantization for Nearest Neighbor Search , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  Wei Liu,et al.  Hashing with Graphs , 2011, ICML.

[36]  Cordelia Schmid,et al.  Improving Bag-of-Features for Large Scale Image Search , 2010, International Journal of Computer Vision.

[37]  Yongdong Zhang,et al.  An improved method of locality sensitive hashing for indexing large-scale and high-dimensional features , 2013, Signal Process..

[38]  Zi Huang,et al.  Multiple feature hashing for real-time large scale near-duplicate video retrieval , 2011, ACM Multimedia.