论文信息 - Nearest Neighbors Can Be Found Efficiently If the Dimension Is Small Relative to the Input Size

Nearest Neighbors Can Be Found Efficiently If the Dimension Is Small Relative to the Input Size

We consider the problem of nearest-neighbor search for a set of n data points in d-dimensional Euclidean space. We propose a simple, practical data structure, which is basically a directed acyclic graph in which each node has at most two outgoing arcs. We analyze the performance of this data structure for the setting in which the n data points are chosen independently from a d-dimensional ball under the uniform distribution. In the average case, for fixed dimension d, we achieve a query time of O(log2 n) using only O(n) storage space. For variable dimension, both the query time and the storage space are multiplied with a dimension-dependent factor that is at most exponential in d. This is an improvement over previously known time-space tradeoffs, which all have a super-exponential factor of at least d� (d) either in the query time or in the storage space. Our data structure can be stored efficiently in secondary memory: In a standard secondary-memory model, for fixed dimension d, we achieve average-case bounds of O((log2 n)/B + log n) query time and O(N) storage space, where B is the block-size parameter and N = n/B. Our data structure is not limited to Euclidean space; its definition generalizes to all possible choices of query objects, data objects, and distance functions.

Michiel Hagedoorn

[1] Rafail Ostrovsky,et al. Efficient search for approximate nearest neighbor in high dimensional spaces , 1998, STOC '98.

[2] Christos Faloutsos,et al. FastMap: a fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets , 1995, SIGMOD '95.

[3] Piotr Indyk,et al. Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.

[4] Timothy M. Chan. Closest-point problems simplified on the RAM , 2002, SODA '02.

[5] David G. Stork,et al. Pattern Classification , 1973 .

[6] Kenneth L. Clarkson,et al. A Randomized Algorithm for Closest-Point Queries , 1988, SIAM J. Comput..

[7] Jonathan Goldstein,et al. When Is ''Nearest Neighbor'' Meaningful? , 1999, ICDT.

[8] Helmut Alt,et al. Exact L∞ nearest neighbor search in high dimensions , 2001, SCG '01.

[9] S. Meiser,et al. Point Location in Arrangements of Hyperplanes , 1993, Inf. Comput..

[10] Sariel Har-Peled. A replacement for Voronoi diagrams of near linear size , 2001, Proceedings 2001 IEEE International Conference on Cluster Computing.

[11] Remco C. Veltkamp,et al. Efficient image retrieval through vantage objects , 1999, Pattern Recognit..

[12] Hanan Samet,et al. Applications of spatial data structures - computer graphics, image processing, and GIS , 1990 .

[13] Ketan Mulmuley,et al. Computational geometry : an introduction through randomized algorithms , 1993 .

[14] Alex Pentland,et al. Photobook: Content-based manipulation of image databases , 1996, International Journal of Computer Vision.

[15] Alex Pentland,et al. Photobook: tools for content-based manipulation of image databases , 1994, Electronic Imaging.

[16] Sunil Arya,et al. An optimal algorithm for approximate nearest neighbor searching fixed dimensions , 1998, JACM.

[17] Christos Faloutsos,et al. Deflating the dimensionality curse using multiple fractal dimensions , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[18] Jonathan Goldstein,et al. Contrast Plots and P-Sphere Trees: Space vs. Time in Nearest Neighbour Searches , 2000, VLDB.

[19] Christos Faloutsos,et al. Fast Nearest Neighbor Search in Medical Image Databases , 1996, VLDB.

[20] Rex A. Dwyer,et al. The Expected Number of k-Faces of a Voronoi Diagram , 1993 .

[21] Sunil Arya,et al. Algorithms for fast vector quantization , 1993, [Proceedings] DCC `93: Data Compression Conference.

[22] Rajeev Motwani,et al. Randomized algorithms , 1996, CSUR.