论文信息 - A Revisit of Hashing Algorithms for Approximate Nearest Neighbor Search

A Revisit of Hashing Algorithms for Approximate Nearest Neighbor Search

Approximate Nearest Neighbor Search (ANNS) is a fundamental problem in many areas of machine learning and data mining. During the past decade, numerous hashing algorithms are proposed to solve this problem. Every proposed algorithm claims outperform other state-of-the-art hashing methods. However, the evaluation of these hashing papers was not thorough enough, and those claims should be re-examined. The ultimate goal of an ANNS method is returning the most accurate answers (nearest neighbors) in the shortest time. If implemented correctly, almost all the hashing methods will have their performance improved as the code length increases. However, many existing hashing papers only report the performance with the code length shorter than 128. In this paper, we carefully revisit the problem of search with a hash index, and analyze the pros and cons of two popular hash index search procedures. Then we proposed a very simple but effective two level index structures and make a thorough comparison of eleven popular hashing algorithms. Surprisingly, the random-projection-based Locality Sensitive Hashing (LSH) is the best performed algorithm, which is in contradiction to the claims in all the other ten hashing papers. Despite the extreme simplicity of random-projection-based LSH, our results show that the capability of this algorithm has been far underestimated. For the sake of reproducibility, all the codes used in the paper are released on GitHub, which can be used as a testing platform for a fair comparison between various hashing algorithms.

Deng Cai

[1] Hongtao Lu,et al. Locality Preserving Hashing , 2014, AAAI.

[2] Svetlana Lazebnik,et al. Iterative quantization: A procrustean approach to learning binary codes , 2011, CVPR 2011.

[3] Wu-Jun Li,et al. Isotropic Hashing , 2012, NIPS.

[4] Sunil Arya,et al. Approximate nearest neighbor queries in fixed dimensions , 1993, SODA '93.

[5] Xinlei Chen,et al. Large Scale Spectral Clustering with Landmark-Based Representation , 2011, AAAI.

[6] Rongrong Ji,et al. Supervised hashing with kernels , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[7] Shih-Fu Chang,et al. Spherical hashing , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[8] Kristen Grauman,et al. Kernelized locality-sensitive hashing for scalable image search , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[9] Nenghai Yu,et al. Complementary hashing for approximate nearest neighbor search , 2011, 2011 International Conference on Computer Vision.

[10] Jon Louis Bentley,et al. Multidimensional binary search trees used for associative searching , 1975, CACM.

[11] Jon M. Kleinberg,et al. Two algorithms for nearest-neighbor search in high dimensions , 1997, STOC '97.

[12] Jun Wang,et al. Self-taught hashing for fast similarity search , 2010, SIGIR.

[13] Yannis Avrithis,et al. Locally Optimized Product Quantization for Approximate Nearest Neighbor Search , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[14] Richard I. Hartley,et al. Optimised KD-trees for fast image descriptor matching , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[15] Xuelong Li,et al. Compressed Hashing , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[16] Deng Cai,et al. Fast Approximate Nearest Neighbor Search With Navigating Spreading-out Graphs , 2017, ArXiv.

[17] Deng Cai,et al. EFANNA : An Extremely Fast Approximate Nearest Neighbor Search Algorithm Based on kNN Graph , 2016, ArXiv.

[18] Yasin Abbasi-Yadkori,et al. Fast Approximate Nearest-Neighbor Search with k-Nearest Neighbor Graph , 2011, IJCAI.

[19] Yury A. Malkov,et al. Efficient and Robust Approximate Nearest Neighbor Search Using Hierarchical Navigable Small World Graphs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20] Antonio Torralba,et al. Multidimensional Spectral Hashing , 2012, ECCV.

[21] Piotr Indyk,et al. Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.

[22] Mikhail Belkin,et al. Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering , 2001, NIPS.

[23] Wei Liu,et al. Large Graph Construction for Scalable Semi-Supervised Learning , 2010, ICML.

[24] Wei Liu,et al. Discrete Graph Hashing , 2014, NIPS.

[25] Ralph Weischedel,et al. PERFORMANCE MEASURES FOR INFORMATION EXTRACTION , 2007 .

[26] Piotr Indyk,et al. Similarity Search in High Dimensions via Hashing , 1999, VLDB.

[27] Henry S. Warren,et al. Hacker's Delight , 2002 .

[28] Deng Cai,et al. Density Sensitive Hashing , 2012, IEEE Transactions on Cybernetics.

[29] Cordelia Schmid,et al. Product Quantization for Nearest Neighbor Search , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30] Tom Drummond,et al. FANNG: Fast Approximate Nearest Neighbour Graphs , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31] Xuemin Lin,et al. Approximate Nearest Neighbor Search on High Dimensional Data — Experiments, Analyses, and Improvement , 2016, IEEE Transactions on Knowledge and Data Engineering.

[32] Hans-Jörg Schek,et al. A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces , 1998, VLDB.

[33] BentleyJon Louis. Multidimensional binary search trees used for associative searching , 1975 .

[34] Wei Liu,et al. Hashing with Graphs , 2011, ICML.

[35] David J. Fleet,et al. Minimal Loss Hashing for Compact Binary Codes , 2011, ICML.

[36] Yao Hu,et al. Fast and Accurate Hashing Via Iterative Nearest Neighbors Expansion , 2014, IEEE Transactions on Cybernetics.

[37] Xinlei Chen,et al. Large Scale Spectral Clustering Via Landmark-Based Sparse Representation , 2015, IEEE Transactions on Cybernetics.

[38] David G. Lowe,et al. Scalable Nearest Neighbor Algorithms for High Dimensional Data , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[39] Olivier Buisson,et al. Random maximum margin hashing , 2011, CVPR 2011.

[40] Heng Tao Shen,et al. Hashing for Similarity Search: A Survey , 2014, ArXiv.

[41] Xuelong Li,et al. Complementary Projection Hashing , 2013, 2013 IEEE International Conference on Computer Vision.

[42] Jian Sun,et al. K-Means Hashing: An Affinity-Preserving Quantization Method for Learning Binary Compact Codes , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[43] Chun Chen,et al. Harmonious Hashing , 2013, IJCAI.

[44] W. Marsden. I and J , 2012 .

[45] Jian Sun,et al. Optimized Product Quantization for Approximate Nearest Neighbor Search , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[46] Antonio Torralba,et al. Spectral Hashing , 2008, NIPS.

[47] Shih-Fu Chang,et al. Sequential Projection Learning for Hashing with Compact Codes , 2010, ICML.

[48] Keinosuke Fukunaga,et al. A Branch and Bound Algorithm for Computing k-Nearest Neighbors , 1975, IEEE Transactions on Computers.

[49] Wei Liu,et al. Scalable similarity search with optimized kernel hashing , 2010, KDD.

[50] Deng Cai,et al. Fast Approximate Nearest Neighbor Search With The Navigating Spreading-out Graph , 2017, Proc. VLDB Endow..

[51] Trevor Darrell,et al. Learning to Hash with Binary Reconstructive Embeddings , 2009, NIPS.

[52] Moses Charikar,et al. Similarity estimation techniques from rounding algorithms , 2002, STOC '02.