Fast and Accurate Hashing Via Iterative Nearest Neighbors Expansion

Recently, the hashing techniques have been widely applied to approximate the nearest neighbor search problem in many real applications. The basic idea of these approaches is to generate binary codes for data points which can preserve the similarity between any two of them. Given a query, instead of performing a linear scan of the entire data base, the hashing method can perform a linear scan of the points whose hamming distance to the query is not greater than rh, where rh is a constant. However, in order to find the true nearest neighbors, both the locating time and the linear scan time are proportional to O(Σi=0rh (ic )) (c is the code length), which increase exponentially as rh increases. To address this limitation, we propose a novel algorithm named iterative expanding hashing in this paper, which builds an auxiliary index based on an offline constructed nearest neighbor table to avoid large rh. This auxiliary index can be easily combined with all the traditional hashing methods. Extensive experimental results over various real large-scale datasets demonstrate the superiority of the proposed approach.

[1]  Mark de Berg,et al.  The Priority R-tree: a practically efficient and worst-case optimal R-tree , 2004, SIGMOD '04.

[2]  Jonathan Goldstein,et al.  When Is ''Nearest Neighbor'' Meaningful? , 1999, ICDT.

[3]  Olivier Buisson,et al.  A posteriori multi-probe locality sensitive hashing , 2008, ACM Multimedia.

[4]  Andrew W. Moore,et al.  New Algorithms for Efficient High-Dimensional Nonparametric Classification , 2006, J. Mach. Learn. Res..

[5]  Pavel Zezula,et al.  M-tree: An Efficient Access Method for Similarity Search in Metric Spaces , 1997, VLDB.

[6]  Rongrong Ji,et al.  Supervised hashing with kernels , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Zhe Wang,et al.  Multi-Probe LSH: Efficient Indexing for High-Dimensional Similarity Search , 2007, VLDB.

[8]  Xuelong Li,et al.  Compressed Hashing , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Sanghamitra Bandyopadhyay,et al.  Dynamic Range-Based Distance Measure for Microarray Expressions and a Fast Gene-Ordering Algorithm , 2007, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[10]  Richard I. Hartley,et al.  Optimised KD-trees for fast image descriptor matching , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Chun Chen,et al.  Semi-Supervised Nonlinear Hashing Using Bootstrap Sequential Projection Learning , 2013, IEEE Transactions on Knowledge and Data Engineering.

[12]  David G. Lowe,et al.  Shape indexing using approximate nearest-neighbour search in high-dimensional spaces , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[13]  Deng Cai,et al.  Density Sensitive Hashing , 2012, IEEE Transactions on Cybernetics.

[14]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[15]  Miin-Shen Yang,et al.  On the edited fuzzy K-nearest neighbor rule , 1998, IEEE Trans. Syst. Man Cybern. Part B.

[16]  Sunil Arya,et al.  An optimal algorithm for approximate nearest neighbor searching fixed dimensions , 1998, JACM.

[17]  Sanjoy Dasgupta,et al.  Random projection trees and low dimensional manifolds , 2008, STOC.

[18]  Shuicheng Yan,et al.  Weakly-supervised hashing in kernel space , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[19]  David G. Lowe,et al.  Fast Approximate Nearest Neighbors with Automatic Algorithm Configuration , 2009, VISAPP.

[20]  Zhiwen Yu,et al.  Hybrid Associative Retrieval of Three-Dimensional Models , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[21]  W. B. Johnson,et al.  Extensions of Lipschitz mappings into Hilbert space , 1984 .

[22]  Shih-Fu Chang,et al.  Spherical hashing , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Prateek Jain,et al.  Fast Similarity Search for Learned Metrics , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Kristen Grauman,et al.  Kernelized Locality-Sensitive Hashing , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Chip-Hong Chang,et al.  Self-organizing topological tree for online vector quantization and data clustering , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[26]  Alexandr Andoni,et al.  Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[27]  Svetlana Lazebnik,et al.  Locality-sensitive binary codes from shift-invariant kernels , 2009, NIPS.

[28]  Shengyu Zhang,et al.  Combinatorial algorithms for nearest neighbors, near-duplicates and small-world design , 2009, SODA.

[29]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[30]  Trevor Darrell,et al.  Learning to Hash with Binary Reconstructive Embeddings , 2009, NIPS.

[31]  David J. Fleet,et al.  Minimal Loss Hashing for Compact Binary Codes , 2011, ICML.

[32]  Nenghai Yu,et al.  Complementary hashing for approximate nearest neighbor search , 2011, 2011 International Conference on Computer Vision.

[33]  Jon Louis Bentley,et al.  An Algorithm for Finding Best Matches in Logarithmic Expected Time , 1977, TOMS.

[34]  Zhou Yu,et al.  Error-correcting output hashing in fast similarity search , 2010, ICIMCS '10.

[35]  Antonio Torralba,et al.  Spectral Hashing , 2008, NIPS.

[36]  Shih-Fu Chang,et al.  Sequential Projection Learning for Hashing with Compact Codes , 2010, ICML.

[37]  Yasin Abbasi-Yadkori,et al.  Fast Approximate Nearest-Neighbor Search with k-Nearest Neighbor Graph , 2011, IJCAI.

[38]  Regunathan Radhakrishnan,et al.  Compact hashing with joint optimization of search accuracy and time , 2011, CVPR 2011.

[39]  Svetlana Lazebnik,et al.  Iterative quantization: A procrustean approach to learning binary codes , 2011, CVPR 2011.

[40]  Vincent Lepetit,et al.  Thick boundaries in binary space and their influence on nearest-neighbor search , 2012, Pattern Recognit. Lett..

[41]  Olivier Buisson,et al.  Random maximum margin hashing , 2011, CVPR 2011.

[42]  Chun Chen,et al.  Harmonious Hashing , 2013, IJCAI.

[43]  Edgar Chávez,et al.  Using the k-Nearest Neighbor Graph for Proximity Searching in Metric Spaces , 2005, SPIRE.

[44]  Yunjun Gao,et al.  Continuous visible nearest neighbor query processing in spatial databases , 2010, The VLDB Journal.

[45]  Jason Tsong-Li Wang,et al.  Fast retrieval of electronic messages that contain mistyped words or spelling errors , 1997, IEEE Trans. Syst. Man Cybern. Part B.

[46]  Yao Hu,et al.  A Unified Approximate Nearest Neighbor Search Scheme by Combining Data Structure and Hashing , 2013, IJCAI.

[47]  Shih-Fu Chang,et al.  Semi-Supervised Hashing for Large-Scale Search , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[48]  Rina Panigrahy,et al.  Entropy based nearest neighbor search in high dimensions , 2005, SODA '06.

[49]  Wei Liu,et al.  Hashing with Graphs , 2011, ICML.

[50]  Kaizhong Zhang,et al.  MetricMap: an embedding technique for processing distance-based queries in metric spaces , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[51]  Rong Jin,et al.  Random Projection with Filtering for Nearly Duplicate Search , 2012, AAAI.

[52]  Xuelong Li,et al.  Complementary Projection Hashing , 2013, 2013 IEEE International Conference on Computer Vision.