Improved Hamming Distance Search Using Variable Length Hashing

This paper addresses the problem of ultra-large-scale search in Hamming spaces. There has been considerable research on generating compact binary codes in vision, for example for visual search tasks. However the issue of efficient searching through huge sets of binary codes remains largely unsolved. To this end, we propose a novel, unsupervised approach to thresholded search in Hamming space, supporting long codes (e.g. 512-bits) with a wide-range of Hamming distance radii. Our method is capable of working efficiently with billions of codes delivering between one to three orders of magnitude acceleration, as compared to prior art. This is achieved by relaxing the equal-size constraint in the Multi-Index Hashing approach, leading to multiple hash-tables with variable length hash-keys. Based on the theoretical analysis of the retrieval probabilities of multiple hash-tables we propose a novel search algorithm for obtaining a suitable set of hash-key lengths. The resulting retrieval mechanism is shown empirically to improve the efficiency over the state-of-the-art, across a range of datasets, bit-depths and retrieval thresholds.

[1]  Bart Thomee,et al.  New trends and ideas in visual concept detection: the MIR flickr retrieval evaluation initiative , 2010, MIR '10.

[2]  Shih-Fu Chang,et al.  Semi-Supervised Hashing for Large-Scale Search , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  S. Voloshynovskiy,et al.  Statistical analysis of binarized SIFT descriptors , 2011, 2011 7th International Symposium on Image and Signal Processing and Analysis (ISPA).

[4]  Antonio Torralba,et al.  Small codes and large image databases for recognition , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  David J. Fleet,et al.  Fast Exact Search in Hamming Space With Multi-Index Hashing , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  D. Lowe,et al.  Fast Matching of Binary Features , 2012, 2012 Ninth Conference on Computer and Robot Vision.

[8]  Miroslaw Bober,et al.  Robust and scalable aggregation of local features for ultra large-scale retrieval , 2014, 2014 IEEE International Conference on Image Processing (ICIP).

[9]  Matthijs Douze,et al.  Searching in one billion vectors: Re-rank with source coding , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[10]  F. Frances Yao,et al.  Multi-index hashing for information retrieval , 1994, Proceedings 35th Annual Symposium on Foundations of Computer Science.

[11]  Geoffrey E. Hinton,et al.  Semantic hashing , 2009, Int. J. Approx. Reason..

[12]  Cordelia Schmid,et al.  Product Quantization for Nearest Neighbor Search , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Svetlana Lazebnik,et al.  Locality-sensitive binary codes from shift-invariant kernels , 2009, NIPS.

[14]  Cordelia Schmid,et al.  Hamming Embedding and Weak Geometric Consistency for Large Scale Image Search , 2008, ECCV.