An optimal algorithm for approximate nearest neighbor searching fixed dimensions

Consider a set of <italic>S</italic> of <italic>n</italic> data points in real <italic>d</italic>-dimensional space, R<supscrpt>d</supscrpt>, where distances are measured using any Minkowski metric. In nearest neighbor searching, we preprocess <italic>S</italic> into a data structure, so that given any query point <italic>q</italic><inline-equation> <f>∈</f></inline-equation> R<supscrpt>d</supscrpt>, is the closest point of S to <italic>q</italic> can be reported quickly. Given any positive real ε, data point <italic>p</italic> is a (1 +ε)-<italic>approximate nearest neighbor</italic> of <italic>q</italic> if its distance from <italic>q</italic> is within a factor of (1 + ε) of the distance to the true nearest neighbor. We show that it is possible to preprocess a set of <italic>n</italic> points in R<supscrpt>d</supscrpt> in <italic>O(dn</italic> log <italic>n</italic>) time and <italic>O(dn)</italic> space, so that given a query point <italic> q</italic> <inline-equation> <f>∈</f></inline-equation> R<supscrpt>d</supscrpt>, and ε > 0, a (1 + ε)-approximate nearest neighbor of <italic>q</italic> can be computed in <italic>O</italic>(<italic>c</italic><subscrpt><italic>d</italic>, ε</subscrpt> log <italic>n</italic>) time, where <italic>c<subscrpt>d,ε</subscrpt></italic>≤<italic>d</italic> <inline-equation> <f><fen lp="ceil">1 + 6d/<g>e</g><rp post="ceil"></fen></f></inline-equation>;<supscrpt>d</supscrpt> is a factor depending only on dimension and ε. In general, we show that given an integer <italic>k</italic> ≥ 1, (1 + ε)-approximations to the <italic>k</italic> nearest neighbors of <italic>q</italic> can be computed in additional <italic>O(kd</italic> log <italic>n</italic>) time.

[1]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[2]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[3]  Ronald L. Rivest,et al.  On the Optimality of Elia's Algorithm for Performing Best-Match Searches , 1974, IFIP Congress.

[4]  Peter E. Hart,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[5]  Forest Baskett,et al.  An Algorithm for Finding Nearest Neighbors , 1975, IEEE Transactions on Computers.

[6]  Jon Louis Bentley,et al.  An Algorithm for Finding Best Matches in Logarithmic Expected Time , 1977, TOMS.

[7]  John G. Cleary,et al.  Analysis of an Algorithm for Finding Nearest Neighbors in Euclidean Space , 1979, TOMS.

[8]  Bruce W. Weide,et al.  Optimal Expected-Time Algorithms for Closest Point Problems , 1980, TOMS.

[9]  Robert E. Tarjan,et al.  A data structure for dynamic trees , 1981, STOC '81.

[10]  L. Devroye,et al.  8 Nearest neighbor methods in discrimination , 1982, Classification, Pattern Recognition and Reduction of Dimensionality.

[11]  Bernard Chazelle,et al.  A theorem on polygon cutting with applications , 1982, 23rd Annual Symposium on Foundations of Computer Science (sfcs 1982).

[12]  Kenneth L. Clarkson,et al.  Fast algorithms for the all nearest neighbors problem , 1983, 24th Annual Symposium on Foundations of Computer Science (sfcs 1983).

[13]  Robert E. Tarjan,et al.  Fibonacci heaps and their uses in improved network optimization algorithms , 1984, JACM.

[14]  Nariman Farvardin,et al.  Rate-distortion performance of DPCM schemes for autoregressive sources , 1985, IEEE Trans. Inf. Theory.

[15]  Greg N. Frederickson,et al.  Data Structures for On-Line Updating of Minimum Spanning Trees, with Applications , 1985, SIAM J. Comput..

[16]  Robert M. Gray,et al.  An Improvement of the Minimum Distortion Encoding Algorithm for Vector Quantization , 1985, IEEE Trans. Commun..

[17]  Andrew Chi-Chih Yao,et al.  A general approach to d-dimensional geometric queries , 1985, STOC '85.

[18]  Michael Ian Shamos,et al.  Computational geometry: an introduction , 1985 .

[19]  Leonidas J. Guibas,et al.  Linear time algorithms for visibility and shortest path problems inside simple polygons , 2011, SCG '86.

[20]  Herbert Edelsbrunner,et al.  Algorithms in Combinatorial Geometry , 1987, EATCS Monographs in Theoretical Computer Science.

[21]  M. Reza Soleymani,et al.  An Efficient Nearest Neighbor Search Method , 1987, IEEE Trans. Commun..

[22]  Tomás Feder,et al.  Optimal algorithms for approximate clustering , 1988, STOC '88.

[23]  Kenneth L. Clarkson,et al.  A Randomized Algorithm for Closest-Point Queries , 1988, SIAM J. Comput..

[24]  Hanan Samet,et al.  The Design and Analysis of Spatial Data Structures , 1989 .

[25]  Hanan Samet,et al.  Hierarchical Spatial Data Structures , 1989, SSD.

[26]  Pravin M. Vaidya,et al.  AnO(n logn) algorithm for the all-nearest-neighbors Problem , 1989, Discret. Comput. Geom..

[27]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[28]  Allen Gersho,et al.  Vector quantization and signal compression , 1991, The Kluwer international series in engineering and computer science.

[29]  Mohamed S. Kamel,et al.  Equal-average hyperplane partitioning method for vector quantization of image data , 1992, Pattern Recognit. Lett..

[30]  S. Rao Kosaraju,et al.  A decomposition of multi-dimensional point-sets with applications to k-nearest-neighbors and n-body potential fields (preliminary version) , 1992, STOC '92.

[31]  Michiel H. M. Smid,et al.  Enumerating the k closest pairs optimally , 1992, Proceedings., 33rd Annual Symposium on Foundations of Computer Science.

[32]  Jirí Matousek,et al.  Ray shooting and parametric search , 1992, STOC '92.

[33]  S. Meiser,et al.  Point Location in Arrangements of Hyperplanes , 1993, Inf. Comput..

[34]  Marshall W. Bern,et al.  Approximate Closest-Point Queries in High Dimensions , 1993, Inf. Process. Lett..

[35]  Ronald L. Rivest,et al.  Scapegoat trees , 1993, SODA '93.

[36]  David Eppstein,et al.  Parallel Construction of Quadtrees and Quality Triangulations , 1993, WADS.

[37]  Sunil Arya,et al.  Approximate nearest neighbor queries in fixed dimensions , 1993, SODA '93.

[38]  Michiel H. M. Smid,et al.  Randomized data structures for the dynamic closest-pair problem , 1998, SODA '93.

[39]  Sunil Arya,et al.  Algorithms for fast vector quantization , 1993, [Proceedings] DCC `93: Data Compression Conference.

[40]  Greg N. Frederickson,et al.  A data structure for dynamically maintaining rooted trees , 1997, SODA '93.

[41]  C.-H. Lee,et al.  Fast closest codeword search algorithm for vector quantization , 1994 .

[42]  Kenneth L. Clarkson,et al.  An algorithm for approximate closest-point queries , 1994, SCG '94.

[43]  Sunil Arya,et al.  Accounting for boundary effects in nearest neighbor searching , 1995, SCG '95.

[44]  S. Rao Kosaraju,et al.  A decomposition of multidimensional point sets with applications to k-nearest-neighbors and n-body potential fields , 1995, JACM.

[45]  Nick Roussopoulos,et al.  Nearest neighbor queries , 1995, SIGMOD '95.

[46]  Sergei Bespamyatnikh,et al.  An Optimal Algorithm for Closest-Pair Maintenance , 1998, Discret. Comput. Geom..

[47]  S. Rao Kosaraju,et al.  Algorithms for dynamic closest pair and n-body potential fields , 1995, SODA '95.

[48]  Sergei N. Bespamyatnikh An optimal algorithm for closest pair maintenance (extended abstract) , 1995, SoCG 1995.

[49]  Dragutin Petkovic,et al.  Query by Image and Video Content: The QBIC System , 1995, Computer.

[50]  Sunil Arya,et al.  Approximate range searching , 1995, SCG '95.

[51]  Hans-Peter Kriegel,et al.  The X-tree : An Index Structure for High-Dimensional Data , 2001, VLDB.

[52]  Ramesh C. Jain,et al.  Similarity indexing with the SS-tree , 1996, Proceedings of the Twelfth International Conference on Data Engineering.

[53]  Timothy M. Chan Approximate Nearest Neighbor Queries Revisited , 1997, SCG '97.

[54]  Jon M. Kleinberg,et al.  Two algorithms for nearest-neighbor search in high dimensions , 1997, STOC '97.

[55]  Mark de Berg,et al.  Computational geometry: algorithms and applications , 1997 .

[56]  Christian Böhm,et al.  A cost model for nearest neighbor search in high-dimensional data space , 1997, PODS.

[57]  Piotr Indyk,et al.  Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.

[58]  Rafail Ostrovsky,et al.  Efficient search for approximate nearest neighbor in high dimensional spaces , 1998, STOC '98.

[59]  David M. Mount,et al.  Chromatic nearest neighbor searching: A query sensitive approach , 2000, Comput. Geom..