Learning Space Partitions for Nearest Neighbor Search

Space partitions of $\mathbb{R}^d$ underlie a vast and important class of fast nearest neighbor search (NNS) algorithms. Inspired by recent theoretical work on NNS for general metric spaces (Andoni et al. 2018b,c), we develop a new framework for building space partitions reducing the problem to balanced graph partitioning followed by supervised classification. We instantiate this general approach with the KaHIP graph partitioner (Sanders and Schulz 2013) and neural networks, respectively, to obtain a new partitioning procedure called Neural Locality-Sensitive Hashing (Neural LSH). On several standard benchmarks for NNS (Aumuller et al. 2017), our experiments show that the partitions obtained by Neural LSH consistently outperform partitions found by quantization-based and tree-based methods as well as classic, data-oblivious LSH.

[1]  Le Song,et al.  2 Common Formulation for Greedy Algorithms on Graphs , 2018 .

[2]  Alexandr Andoni,et al.  Hölder Homeomorphisms and Approximate Nearest Neighbors , 2018, 2018 IEEE 59th Annual Symposium on Foundations of Computer Science (FOCS).

[3]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[4]  Alexandr Andoni,et al.  Practical and Optimal LSH for Angular Distance , 2015, NIPS.

[5]  Alexandr Andoni,et al.  Optimal Hashing-based Time-Space Trade-offs for Approximate Near Neighbors , 2016, SODA.

[6]  Fan Yang,et al.  LoSHa: A General Framework for Scalable Locality Sensitive Hashing , 2017, SIGIR.

[7]  Xuemin Lin,et al.  SRS: Solving c-Approximate Nearest Neighbor Queries in High Dimensional Euclidean Space with a Tiny Index , 2014, Proc. VLDB Endow..

[8]  Tim Kraska,et al.  The Case for Learned Index Structures , 2018 .

[9]  Ilya P. Razenshteyn,et al.  SANNS: Scaling Up Secure Approximate k-Nearest Neighbors Search , 2019, IACR Cryptol. ePrint Arch..

[10]  King-Sun Fu,et al.  IEEE Transactions on Pattern Analysis and Machine Intelligence Publication Information , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Peter Sanders,et al.  Think Locally, Act Globally: Highly Balanced Graph Partitioning , 2013, SEA.

[12]  Sanjoy Dasgupta,et al.  A neural algorithm for a fundamental computing problem , 2017 .

[13]  Robert F. Sproull,et al.  Refinements to nearest-neighbor searching ink-dimensional trees , 1991, Algorithmica.

[14]  Wei Liu,et al.  Learning to Hash for Indexing Big Data—A Survey , 2015, Proceedings of the IEEE.

[15]  Matt J. Kusner,et al.  From Word Embeddings To Document Distances , 2015, ICML.

[16]  Martin Aumüller,et al.  ANN-Benchmarks: A Benchmarking Tool for Approximate Nearest Neighbor Algorithms , 2018, SISAP.

[17]  Qin Zhang,et al.  EmbedJoin: Efficient Edit Similarity Joins via Embeddings , 2017, KDD.

[18]  Ashish Goel,et al.  Efficient distributed locality sensitive hashing , 2012, CIKM.

[19]  Richard G. Baraniuk,et al.  Learned D-AMP: Principled Neural Network based Compressive Image Recovery , 2017, NIPS.

[20]  Michael Mitzenmacher,et al.  A Model for Learned Bloom Filters and Optimizing by Sandwiching , 2018, NeurIPS.

[21]  Victor S. Lempitsky,et al.  The Inverted Multi-Index , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Richard G. Baraniuk,et al.  A deep learning approach to structured signal recovery , 2015, 2015 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[23]  Sanjoy Dasgupta,et al.  A learning framework for nearest neighbor search , 2007, NIPS.

[24]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[25]  Yihong Gong,et al.  Learning to Search Efficiently in High Dimensions , 2011, NIPS.

[26]  Ludwig Schmidt,et al.  Learning Representations for Faster Similarity Search , 2018 .

[27]  Alexandr Andoni,et al.  Data-dependent hashing via nonlinear spectral gaps , 2018, STOC.

[28]  Alexandros G. Dimakis,et al.  Compressed Sensing using Generative Models , 2017, ICML.

[29]  Jeff Johnson,et al.  Billion-Scale Similarity Search with GPUs , 2017, IEEE Transactions on Big Data.

[30]  Svetlana Lazebnik,et al.  Iterative quantization: A procrustean approach to learning binary codes , 2011, CVPR 2011.

[31]  Kaushik Sinha,et al.  Improved nearest neighbor search using auxiliary information and priority functions , 2018, ICML.

[32]  Sergei Vassilvitskii,et al.  Competitive caching with machine learned advice , 2018, ICML.

[33]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[34]  Cordelia Schmid,et al.  Product Quantization for Nearest Neighbor Search , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  Alexandr Andoni,et al.  Approximate Nearest Neighbor Search in High Dimensions , 2018, Proceedings of the International Congress of Mathematicians (ICM 2018).

[36]  Jiwen Lu,et al.  Deep hashing for compact binary codes learning , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Aditya Bhaskara,et al.  Distributed Clustering via LSH Based Data Partitioning , 2018, ICML.

[38]  Volkan Cevher,et al.  Learning-Based Compressive Subsampling , 2015, IEEE Journal of Selected Topics in Signal Processing.

[39]  David J. Fleet,et al.  Cartesian K-Means , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[40]  Parikshit Ram,et al.  Which Space Partitioning Tree to Use for Search? , 2013, NIPS.

[41]  Zhe Wang,et al.  Multi-Probe LSH: Efficient Indexing for High-Dimensional Similarity Search , 2007, VLDB.

[42]  Maria-Florina Balcan,et al.  Learning to Branch , 2018, ICML.

[43]  Cordelia Schmid,et al.  Spreading vectors for similarity search , 2018, ICLR.

[44]  Sanjiv Kumar,et al.  Multiscale Quantization for Fast Similarity Search , 2017, NIPS.

[45]  Google,et al.  Improving Online Algorithms via ML Predictions , 2024, NeurIPS.

[46]  Heng Tao Shen,et al.  Hashing for Similarity Search: A Survey , 2014, ArXiv.

[47]  Lior Wolf,et al.  In Defense of Product Quantization , 2017, ArXiv.

[48]  Shree K. Nayar,et al.  What Is a Good Nearest Neighbors Algorithm for Finding Similar Patches in Images? , 2008, ECCV.

[49]  Yury A. Malkov,et al.  Efficient and Robust Approximate Nearest Neighbor Search Using Hierarchical Navigable Small World Graphs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[50]  Sanjoy Dasgupta,et al.  Randomized partition trees for exact nearest neighbor search , 2013, COLT.

[51]  Alexandr Andoni,et al.  Spectral Approaches to Nearest Neighbor Search , 2014, 2014 IEEE 55th Annual Symposium on Foundations of Computer Science.

[52]  Mayank Bawa,et al.  LSH forest: self-tuning indexes for similarity search , 2005, WWW '05.

[53]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.