Expected-case complexity of approximate nearest neighbor searching

Most research in algorithms for geometric query problems has focused on their worst- case performance. However, when information on the query distribution is available, the alternative paradigm of designing and analyzing algorithms from the perspective of expected-case performance appears more attractive. We study the approximate nearest neighbor problem from this perspective. As a first stepin this direction, we assume that the query p oints are samp led uniformly from a hypercube that encloses all the data points; however, we make no assumption on the distribution of the data points. We show that with a simple partition tree, called the sliding-midpoint tree, it is possible to achieve linear space and logarithmic query time in the expected case; in contrast, the data structures known to achieve linear space and logarithmic query time in the worst case are complex, and algorithms on them run more slowly in practice. Moreover, we prove that the sliding-midpoint tree achieves optimal expected query time in a certain class of algorithms.

[1]  C. E. SHANNON,et al.  A mathematical theory of communication , 1948, MOCO.

[2]  Donald E. Knuth,et al.  Sorting and Searching , 1973 .

[3]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[4]  Sorting and searching" the art of computer programming , 1973 .

[5]  Peter E. Hart,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[6]  Jon Louis Bentley,et al.  An Algorithm for Finding Best Matches in Logarithmic Expected Time , 1977, TOMS.

[7]  Nariman Farvardin,et al.  Rate-distortion performance of DPCM schemes for autoregressive sources , 1985, IEEE Trans. Inf. Theory.

[8]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[9]  Allen Gersho,et al.  Vector quantization and signal compression , 1991, The Kluwer international series in engineering and computer science.

[10]  Marshall W. Bern,et al.  Approximate Closest-Point Queries in High Dimensions , 1993, Inf. Process. Lett..

[11]  Sunil Arya,et al.  Approximate nearest neighbor queries in fixed dimensions , 1993, SODA '93.

[12]  Sunil Arya,et al.  Algorithms for fast vector quantization , 1993, [Proceedings] DCC `93: Data Compression Conference.

[13]  Kenneth L. Clarkson,et al.  An algorithm for approximate closest-point queries , 1994, SCG '94.

[14]  Dragutin Petkovic,et al.  Query by Image and Video Content: The QBIC System , 1995, Computer.

[15]  Timothy M. Chan Approximate Nearest Neighbor Queries Revisited , 1997, SCG '97.

[16]  Jon M. Kleinberg,et al.  Two algorithms for nearest-neighbor search in high dimensions , 1997, STOC '97.

[17]  Mark de Berg,et al.  Computational geometry: algorithms and applications , 1997 .

[18]  Piotr Indyk,et al.  Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.

[19]  Sunil Arya,et al.  An optimal algorithm for approximate nearest neighbor searching fixed dimensions , 1998, JACM.

[20]  Sunil Arya,et al.  ANN: library for approximate nearest neighbor searching , 1998 .

[21]  Rafail Ostrovsky,et al.  Efficient search for approximate nearest neighbor in high dimensional spaces , 1998, STOC '98.

[22]  Timothy M. Chan Approximate Nearest Neighbor Queries Revisited , 1998, Discret. Comput. Geom..

[23]  Michael T. Goodrich,et al.  Balanced aspect ratio trees: combining the advantages of k-d trees and octrees , 1999, SODA '99.

[24]  David M. Mount,et al.  Analysis of approximate nearest neighbor searching with clustered point sets , 1999, Data Structures, Near Neighbor Searches, and Methodology.

[25]  David M. Mount,et al.  It's okay to be skinny, if your friends are fat , 1999 .

[26]  Sunil Arya,et al.  Efficient Expected-Case Algorithms for Planar Point Location , 2000, SWAT.

[27]  David M. Mount,et al.  On the Efficiency of Nearest Neighbor Searching with Data Clustered in Lower Dimensions , 2001, International Conference on Computational Science.

[28]  Sunil Arya,et al.  Space-efficient approximate Voronoi diagrams , 2002, STOC '02.

[29]  Sunil Arya,et al.  Linear-size approximate voronoi diagrams , 2002, SODA '02.