Approximate nearest neighbor queries in fixed dimensions

Given a set of n points in d-dimensional Euclidean space, S ⊂ E, and a query point q ∈ E, we wish to determine the nearest neighbor of q, that is, the point of S whose Euclidean distance to q is minimum. The goal is to preprocess the point set S, such that queries can be answered as efficiently as possible. We assume that the dimension d is a constant independent of n. Although reasonably good solutions to this problem exist when d is small, as d increases the performance of these algorithms degrades rapidly. We present a randomized algorithm for approximate nearest neighbor searching. Given any set of n points S ⊂ E, and a constant ǫ > 0, we produce a data structure, such that given any query point, a point of S will be reported whose distance from the query point is at most a factor of (1 + ǫ) from that of the true nearest neighbor. Our algorithm runs in O(log n) expected time and requires O(n log n) space. The data structure can be built in O(n) expected time. The constant factors depend on d and ǫ. Because of the practical importance of nearest neighbor searching in higher dimensions, we have implemented a practical variant of this algorithm, and show empirically that for many point distributions this variant of the algorithm finds the nearest neighbor in moderately large dimension significantly faster than existing practical approaches.

[1]  C. Shannon Probability of error for optimal codes in a Gaussian channel , 1959 .

[2]  A. Wyner Capabilities of bounded discrepancy decoding , 1965 .

[3]  Jon Louis Bentley,et al.  An Algorithm for Finding Best Matches in Logarithmic Expected Time , 1977, TOMS.

[4]  John G. Cleary,et al.  Analysis of an Algorithm for Finding Nearest Neighbors in Euclidean Space , 1979, TOMS.

[5]  Bruce W. Weide,et al.  Optimal Expected-Time Algorithms for Closest Point Problems , 1980, TOMS.

[6]  Godfried T. Toussaint,et al.  The relative neighbourhood graph of a finite planar set , 1980, Pattern Recognit..

[7]  Andrew Chi-Chih Yao,et al.  On Constructing Minimum Spanning Trees in k-Dimensional Spaces and Related Problems , 1977, SIAM J. Comput..

[8]  V. Cuperman,et al.  Vector quantization: A pattern-matching technique for speech coding , 1983, IEEE Communications Magazine.

[9]  L. Tóth,et al.  New Results in the Theory of Packing and Covering , 1983 .

[10]  Allen Gersho,et al.  Fast search algorithms for vector quantization and pattern matching , 1984, ICASSP.

[11]  Nariman Farvardin,et al.  Rate-distortion performance of DPCM schemes for autoregressive sources , 1985, IEEE Trans. Inf. Theory.

[12]  Robert M. Gray,et al.  An Improvement of the Minimum Distortion Encoding Algorithm for Vector Quantization , 1985, IEEE Trans. Commun..

[13]  Andrew Chi-Chih Yao,et al.  A general approach to d-dimensional geometric queries , 1985, STOC '85.

[14]  M. Reza Soleymani,et al.  An Efficient Nearest Neighbor Search Method , 1987, IEEE Trans. Commun..

[15]  Jerzy W. Jaromczyk,et al.  A note on relative neighborhood graphs , 1987, SCG '87.

[16]  Kenneth L. Clarkson,et al.  A Randomized Algorithm for Closest-Point Queries , 1988, SIAM J. Comput..

[17]  William Pugh,et al.  Skip Lists: A Probabilistic Alternative to Balanced Trees , 1989, WADS.

[18]  F. Frances Yao,et al.  Computational Geometry , 1991, Handbook of Theoretical Computer Science, Volume A: Algorithms and Complexity.

[19]  Jon Louis Bentley,et al.  K-d trees for semidynamic point sets , 1990, SCG '90.

[20]  J. Matoussek Reporting points in halfspaces , 1991, FOCS 1991.

[21]  Jirí Matousek,et al.  Reporting Points in Halfspaces , 1992, Comput. Geom..

[22]  Sunil Arya,et al.  Algorithms for fast vector quantization , 1993, [Proceedings] DCC `93: Data Compression Conference.