Fast nearest neighbor search in high-dimensional space

Similarity search in multimedia databases requires an efficient support of nearest neighbor search on a large set of high dimensional points as a basic operation for query processing. As recent theoretical results show, state of the art approaches to nearest neighbor search are not efficient in higher dimensions. In our new approach, we therefore precompute the result of any nearest neighbor search which corresponds to a computation of the voronoi cell of each data point. In a second step, we store the voronoi cells in an index structure efficient for high dimensional data spaces. As a result, nearest neighbor search corresponds to a simple point query on the index structure. Although our technique is based on a precomputation of the solution space, it is dynamic, i.e. it supports insertions of new data points. An extensive experimental evaluation of our technique demonstrates the high efficiency for uniformly distributed as well as real data. We obtained a significant reduction of the search time compared to nearest neighbor search in the X tree (up to a factor of 4).

[1]  Hans-Peter Kriegel,et al.  Query Processing of Spatial Objects: Complexity versus Redundancy , 1993, SSD.

[2]  Christian Böhm,et al.  A cost model for nearest neighbor search in high-dimensional data space , 1997, PODS.

[3]  Christian Böhm,et al.  Fast parallel similarity search in multimedia databases , 1997, SIGMOD '97.

[4]  Rajiv Mehrotra,et al.  Feature-based retrieval of similar shapes , 1993, Proceedings of IEEE 9th International Conference on Data Engineering.

[5]  K. Wakimoto,et al.  Efficient and Effective Querying by Image Content , 1994 .

[6]  Hans-Peter Kriegel,et al.  The R*-tree: an efficient and robust access method for points and rectangles , 1990, SIGMOD '90.

[7]  Christos Faloutsos,et al.  Efficient and effective Querying by Image Content , 1994, Journal of Intelligent Information Systems.

[8]  Hanan Samet,et al.  Ranking in Spatial Databases , 1995, SSD.

[9]  Klaus Ritter,et al.  Linear Programming: Active Set Analysis and Computer Programs , 1985 .

[10]  Brian K. Shoichet,et al.  Molecular docking using shape descriptors , 1992 .

[11]  Hans-Peter Kriegel,et al.  The Performance of Object Decomposition Techniques for Spatial Query Processing , 1991, SSD.

[12]  Nesa L'abbe Wu,et al.  Linear programming and extensions , 1981 .

[13]  Raimund Seidel,et al.  Linear programming and convex hulls made easy , 1990, SCG '90.

[14]  Harpreet Sawhney,et al.  Efficient color histogram indexing , 1994, Proceedings of 1st International Conference on Image Processing.

[15]  Michael Ian Shamos,et al.  Computational geometry: an introduction , 1985 .

[16]  Hans-Peter Kriegel,et al.  The X-tree : An Index Structure for High-Dimensional Data , 2001, VLDB.

[17]  Hans-Peter Kriegel,et al.  Comparison of approximations of complex objects used for approximation-based query processing in spatial database systems , 1993, Proceedings of IEEE 9th International Conference on Data Engineering.

[18]  I. G. Gowda,et al.  Dynamic Voronoi diagrams , 1983, IEEE Trans. Inf. Theory.

[19]  Christian Böhm,et al.  Independent quantization: an index compression technique for high-dimensional data spaces , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[20]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[21]  H. V. Jagadish,et al.  A retrieval technique for similar shapes , 1991, SIGMOD '91.

[22]  George B. Dantzig,et al.  Linear programming and extensions , 1965 .

[23]  S. Arya Nearest neighbor searching and applications , 1996 .

[24]  Yannis Manolopoulos,et al.  Performance of Nearest Neighbor Queries in R-Trees , 1997, ICDT.

[25]  Nick Roussopoulos,et al.  Nearest neighbor queries , 1995, SIGMOD '95.