论文信息 - Learning Sublinear-Time Indexing for Nearest Neighbor Search

Learning Sublinear-Time Indexing for Nearest Neighbor Search

Most of the efficient sublinear-time indexing algorithms for the high-dimensional nearest neighbor search problem (NNS) are based on space partitions of the ambient space $\mathbb{R}^d$. Inspired by recent theoretical work on NNS for general metric spaces [Andoni, Naor, Nikolov, Razenshteyn, Waingarten STOC 2018, FOCS 2018], we develop a new framework for constructing such partitions that reduces the problem to balanced graph partitioning followed by supervised classification. We instantiate this general approach with the KaHIP graph partitioner [Sanders, Schulz SEA 2013] and neural networks, respectively, to obtain a new partitioning procedure called Neural Locality-Sensitive Hashing (Neural LSH). On several standard benchmarks for NNS, our experiments show that the partitions found by Neural LSH consistently outperform partitions found by quantization- and tree-based methods.

[1] Ludwig Schmidt,et al. Learning Representations for Faster Similarity Search , 2018 .

[2] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[3] Jeff Johnson,et al. Billion-Scale Similarity Search with GPUs , 2017, IEEE Transactions on Big Data.

[4] Cordelia Schmid,et al. Product Quantization for Nearest Neighbor Search , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[6] Alexandr Andoni,et al. Hölder Homeomorphisms and Approximate Nearest Neighbors , 2018, 2018 IEEE 59th Annual Symposium on Foundations of Computer Science (FOCS).

[7] Alexandr Andoni,et al. Practical and Optimal LSH for Angular Distance , 2015, NIPS.

[8] Alexandr Andoni,et al. Approximate Nearest Neighbor Search in High Dimensions , 2018, Proceedings of the International Congress of Mathematicians (ICM 2018).

[9] Peter Sanders,et al. Think Locally, Act Globally: Highly Balanced Graph Partitioning , 2013, SEA.

[10] Patrick Pérez,et al. SuBiC: A Supervised, Structured Binary Code for Image Search , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[11] Xuemin Lin,et al. SRS: Solving c-Approximate Nearest Neighbor Queries in High Dimensional Euclidean Space with a Tiny Index , 2014, Proc. VLDB Endow..

[12] Tim Kraska,et al. The Case for Learned Index Structures , 2018 .

[13] Robert F. Sproull,et al. Refinements to nearest-neighbor searching ink-dimensional trees , 1991, Algorithmica.

[14] Sanjoy Dasgupta,et al. A neural algorithm for a fundamental computing problem , 2017 .

[15] Martin Aumüller,et al. ANN-Benchmarks: A Benchmarking Tool for Approximate Nearest Neighbor Algorithms , 2018, SISAP.

[16] Alexandr Andoni,et al. Optimal Hashing-based Time-Space Trade-offs for Approximate Near Neighbors , 2016, SODA.

[17] Sanjoy Dasgupta,et al. Randomized partition trees for exact nearest neighbor search , 2013, COLT.

[18] Maria-Florina Balcan,et al. Learning to Branch , 2018, ICML.

[19] Cordelia Schmid,et al. Spreading vectors for similarity search , 2018, ICLR.

[20] Jiwen Lu,et al. Deep hashing for compact binary codes learning , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21] Victor Lempitsky,et al. The inverted multi-index , 2012, CVPR.

[22] Sanjiv Kumar,et al. Multiscale Quantization for Fast Similarity Search , 2017, NIPS.

[23] Matt J. Kusner,et al. From Word Embeddings To Document Distances , 2015, ICML.

[24] Alexandr Andoni,et al. Spectral Approaches to Nearest Neighbor Search , 2014, 2014 IEEE 55th Annual Symposium on Foundations of Computer Science.

[25] Mayank Bawa,et al. LSH forest: self-tuning indexes for similarity search , 2005, WWW '05.

[26] Heng Tao Shen,et al. Hashing for Similarity Search: A Survey , 2014, ArXiv.

[27] Lior Wolf,et al. In Defense of Product Quantization , 2017, ArXiv.

[28] Shree K. Nayar,et al. What Is a Good Nearest Neighbors Algorithm for Finding Similar Patches in Images? , 2008, ECCV.

[29] Yury A. Malkov,et al. Efficient and Robust Approximate Nearest Neighbor Search Using Hierarchical Navigable Small World Graphs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30] Wei Liu,et al. Learning to Hash for Indexing Big Data—A Survey , 2015, Proceedings of the IEEE.

[31] Qin Zhang,et al. EmbedJoin: Efficient Edit Similarity Joins via Embeddings , 2017, KDD.

[32] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[33] Alexandr Andoni,et al. Data-dependent hashing via nonlinear spectral gaps , 2018, STOC.

[34] Kaushik Sinha,et al. Improved nearest neighbor search using auxiliary information and priority functions , 2018, ICML.

[35] Sergei Vassilvitskii,et al. Competitive caching with machine learned advice , 2018, ICML.

[36] Jian Sun,et al. Optimized Product Quantization , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37] Michael Mitzenmacher,et al. A Model for Learned Bloom Filters and Optimizing by Sandwiching , 2018, NeurIPS.

[38] David J. Fleet,et al. Cartesian K-Means , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[39] Zhe Wang,et al. Multi-Probe LSH: Efficient Indexing for High-Dimensional Similarity Search , 2007, VLDB.