Data-oriented multi-index hashing

Multi-index hashing (MIH) is the state-of-the-art method for indexing binary codes, as it divides long codes into substrings and builds multiple hash tables. However, MIH is based on the dataset codes uniform distribution assumption, and will lose efficiency in dealing with non-uniformly distributed codes. Besides, there are lots of results sharing the same Hamming distance to a query, which makes the distance measure ambiguous. In this paper, we propose a data-oriented multi-index hashing method. We first compute the covariance matrix of bits and learn adaptive projection vector for each binary substring. Instead of using substrings as direct indices into hash tables, we project them with corresponding projection vectors to generate new indices. With adaptive projection, the indices in each hash table are near uniformly distributed. Then with covariance matrix, we propose a ranking method for the binary codes. By assigning different bit-level weights to different bits, the returned binary codes are ranked at a finer-grained binary code level. Experiments conducted on reference large scale datasets show that compared to MIH the time performance of our method can be improved by 36.9%-87.4%, and the search accuracy can be improved by 22.2%.

[1]  Antonio Torralba,et al.  Spectral Hashing , 2008, NIPS.

[2]  Pascal Fua,et al.  LDAHash: Improved Matching with Smaller Descriptors , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Jiaheng Lu,et al.  HmSearch: an efficient hamming distance query processing algorithm , 2013, SSDBM.

[4]  Yongdong Zhang,et al.  Efficient approximate nearest neighbor search with integrated binary codes , 2011, ACM Multimedia.

[5]  Wei-Ying Ma,et al.  AnnoSearch: Image Auto-Annotation by Search , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[6]  Yongdong Zhang,et al.  Binary Code Ranking with Weighted Hamming Distance , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  I. Jolliffe Principal Component Analysis , 2002 .

[8]  Cordelia Schmid,et al.  Improving Bag-of-Features for Large Scale Image Search , 2010, International Journal of Computer Vision.

[9]  Gary R. Bradski,et al.  ORB: An efficient alternative to SIFT or SURF , 2011, 2011 International Conference on Computer Vision.

[10]  Antonio Torralba,et al.  Small codes and large image databases for recognition , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Yongdong Zhang,et al.  Contextual Query Expansion for Image Retrieval , 2014, IEEE Transactions on Multimedia.

[12]  Nicole Immorlica,et al.  Locality-sensitive hashing scheme based on p-stable distributions , 2004, SCG '04.

[13]  Rabab Kreidieh Ward,et al.  A Fast Approximate Nearest Neighbor Search Algorithm in the Hamming Space , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Cordelia Schmid,et al.  Product Quantization for Nearest Neighbor Search , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  C. Lawrence Zitnick,et al.  Binary Coherent Edge Descriptors , 2010, ECCV.

[16]  Sheng Tang,et al.  Data driven multi-index hashing , 2013, 2013 IEEE International Conference on Image Processing.

[17]  David J. Fleet,et al.  Fast search in Hamming space with multi-index hashing , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Yongdong Zhang,et al.  Topology preserving hashing for similarity search , 2013, MM '13.

[19]  Geoffrey E. Hinton,et al.  Semantic hashing , 2009, Int. J. Approx. Reason..

[20]  Xiao Zhang,et al.  QsRank: Query-sensitive hash code ranking for efficient ∊-neighbor search , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.