Pruning Algorithms for Low-Dimensional Non-metric k-NN Search: A Case Study

We focus on low-dimensional non-metric search, where tree-based approaches permit efficient and accurate retrieval while having short indexing time. These methods rely on space partitioning and require a pruning rule to avoid visiting unpromising parts. We consider two known data-driven approaches to extend these rules to non-metric spaces: TriGen and a piece-wise linear approximation of the pruning rule. We propose and evaluate two adaptations of TriGen to non-symmetric similarities (TriGen does not support non-symmetric distances). We also evaluate a hybrid of TriGen and the piece-wise linear approximation pruning. We find that this hybrid approach is often more effective than either of the pruning rules. We make our software publicly available.

[1]  David Thomas,et al.  The Art in Computer Programming , 2001 .

[2]  Gonzalo Navarro,et al.  Probabilistic proximity search: Fighting the curse of dimensionality in metric spaces , 2003, Inf. Process. Lett..

[3]  Benjamin Bustos,et al.  On nonmetric similarity search problems in complex domains , 2011, CSUR.

[4]  Peter N. Yianilos,et al.  Data structures and algorithms for nearest neighbor search in general metric spaces , 1993, SODA '93.

[5]  Leonid Boytsov,et al.  Engineering Efficient and Effective Non-metric Space Library , 2013, SISAP.

[6]  Z. Meral Özsoyoglu,et al.  Indexing large metric spaces for similarity search queries , 1999, TODS.

[7]  Donald E. Knuth,et al.  The Art of Computer Programming: Volume 3: Sorting and Searching , 1998 .

[8]  Hanan Samet,et al.  Foundations of multidimensional and metric data structures , 2006, Morgan Kaufmann series in data management systems.

[9]  Marianthi Markatou,et al.  Statistical Distances and Their Role in Robustness , 2016, 1612.07408.

[10]  Anthony K. H. Tung,et al.  Similarity Search on Bregman Divergence: Towards Non-Metric Indexing , 2009, Proc. VLDB Endow..

[11]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[12]  Gonzalo Navarro Searching in metric spaces by spatial approximation , 2002, The VLDB Journal.

[13]  Hans-Jörg Schek,et al.  A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces , 1998, VLDB.

[14]  Yiming Yang,et al.  RCV1: A New Benchmark Collection for Text Categorization Research , 2004, J. Mach. Learn. Res..

[15]  A. Rényi On Measures of Entropy and Information , 1961 .

[16]  Jakub Lokoc,et al.  Ptolemaic access methods: Challenging the reign of the metric space model , 2013, Inf. Syst..

[17]  Ricardo A. Baeza-Yates,et al.  Searching in metric spaces , 2001, CSUR.

[18]  Tomás Skopal,et al.  Unified framework for fast exact and approximate search in dissimilarity spaces , 2007, TODS.

[19]  Lawrence Cayton,et al.  Fast nearest neighbor retrieval for bregman divergences , 2008, ICML '08.

[20]  Stephen M. Omohundro,et al.  Five Balltree Construction Algorithms , 2009 .

[21]  Leonid Boytsov,et al.  Learning to Prune in Metric and Non-Metric Spaces , 2013, NIPS.

[22]  Jonathan Goldstein,et al.  When Is ''Nearest Neighbor'' Meaningful? , 1999, ICDT.

[23]  R. A. Leibler,et al.  On Information and Sufficiency , 1951 .

[24]  Donald E. Knuth,et al.  The Art of Computer Programming, Vol. 3: Sorting and Searching , 1974 .

[25]  L. Bregman The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming , 1967 .

[26]  Jeffrey K. Uhlmann,et al.  Satisfying General Proximity/Similarity Queries with Metric Trees , 1991, Inf. Process. Lett..