Fast k-Nearest Neighbour Search via Dynamic Continuous Indexing

Existing methods for retrieving k-nearest neighbours suffer from the curse of dimensionality. We argue this is caused in part by inherent deficiencies of space partitioning, which is the underlying strategy used by most existing methods. We devise a new strategy that avoids partitioning the vector space and present a novel randomized algorithm that runs in time linear in dimensionality of the space and sub-linear in the intrinsic dimensionality and the size of the dataset and takes space constant in dimensionality of the space and linear in the size of the dataset. The proposed algorithm allows fine-grained control over accuracy and speed on a per-query basis, automatically adapts to variations in data density, supports dynamic updates to the dataset and is easy-to-implement. We show appealing theoretical properties and demonstrate empirically that the proposed algorithm outperforms locality-sensitivity hashing (LSH) in terms of approximation quality, speed and space efficiency.

[1]  Marvin Minsky,et al.  Perceptrons: An Introduction to Computational Geometry , 1969 .

[2]  Keinosuke Fukunaga,et al.  A Branch and Bound Algorithm for Computing k-Nearest Neighbors , 1975, IEEE Transactions on Computers.

[3]  Jon Louis Bentley,et al.  Multidimensional binary search trees used for associative searching , 1975, CACM.

[4]  W. B. Johnson,et al.  Extensions of Lipschitz mappings into Hilbert space , 1984 .

[5]  Antonin Guttman,et al.  R-trees: a dynamic index structure for spatial searching , 1984, SIGMOD '84.

[6]  Michael T. Orchard,et al.  A fast nearest-neighbor search algorithm , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[7]  Sunil Arya,et al.  Approximate nearest neighbor queries in fixed dimensions , 1993, SODA '93.

[8]  Sergey Brin,et al.  Near Neighbor Search in Large Metric Spaces , 1995, VLDB.

[9]  Hans-Peter Kriegel,et al.  The X-tree : An Index Structure for High-Dimensional Data , 2001, VLDB.

[10]  Stefan Berchtold,et al.  Hans-Peter Kriegel: The X-tree : An Index Structure for High-Dimensional Data , 1996, Very Large Data Bases Conference.

[11]  Piotr Indyk,et al.  Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.

[12]  Sunil Arya,et al.  An optimal algorithm for approximate nearest neighbor searching fixed dimensions , 1998, JACM.

[13]  Hans-Peter Kriegel,et al.  Fast nearest neighbor search in high-dimensional space , 1998, Proceedings 14th International Conference on Data Engineering.

[14]  Kenneth L. Clarkson,et al.  Nearest Neighbor Queries in Metric Spaces , 1999, Discret. Comput. Geom..

[15]  David R. Karger,et al.  Finding nearest neighbors in growth-restricted metrics , 2002, STOC '02.

[16]  Robert Krauthgamer,et al.  Navigating nets: simple algorithms for proximity search , 2004, SODA '04.

[17]  Nicole Immorlica,et al.  Locality-sensitive hashing scheme based on p-stable distributions , 2004, SCG '04.

[18]  Andrew W. Moore,et al.  An Investigation of Practical Approximate Nearest Neighbor Algorithms , 2004, NIPS.

[19]  K. Clarkson Nearest-Neighbor Searching and Metric Space Dimensions , 2005 .

[20]  John Langford,et al.  Cover trees for nearest neighbor , 2006, ICML.

[21]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[22]  Sanjoy Dasgupta,et al.  Random projection trees and low dimensional manifolds , 2008, STOC.

[23]  Antonio Torralba,et al.  Spectral Hashing , 2008, NIPS.

[24]  Laurent Amsaleg,et al.  Locality sensitive hashing: A comparison of hash function types and querying mechanisms , 2010, Pattern Recognit. Lett..

[25]  Xueyi Wang,et al.  A fast exact k-nearest neighbors algorithm for high dimensional search using k-means clustering and triangle inequality , 2011, The 2011 International Joint Conference on Neural Networks.

[26]  Piotr Indyk,et al.  Approximate Nearest Neighbor: Towards Removing the Curse of Dimensionality , 2012, Theory Comput..

[27]  Sanjoy Dasgupta,et al.  Randomized Partition Trees for Nearest Neighbor Search , 2014, Algorithmica.

[28]  Michael E. Houle,et al.  Rank-Based Similarity Search: Reducing the Dimensional Dependence , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Ke,et al.  Fast k-Nearest Neighbour Search via Dynamic Continuous Indexing , 2017 .