Indexing the Solution Space: A New Technique for Nearest Neighbor Search in High-Dimensional Space

Similarity search in multimedia databases requires an efficient support of nearest-neighbor search on a large set of high-dimensional points as a basic operation for query processing. As recent theoretical results show, state of the art approaches to nearest-neighbor search are not efficient in higher dimensions. In our new approach, we therefore precompute the result of any nearest-neighbor search which corresponds to a computation of the Voronoi cell of each data point. In a second step, we store conservative approximations of the Voronoi cells in an index structure efficient for high-dimensional data spaces. As a result, nearest neighbor search corresponds to a simple point query on the index structure. Although our technique is based on a precomputation of the solution space, it is dynamic, i.e., it supports insertions of new data points. An extensive experimental evaluation of our technique demonstrates the high efficiency for uniformly distributed as well as real data. We obtained a significant reduction of the search time compared to nearest neighbor search in other index structures such as the X-tree.

[1]  Aaas News,et al.  Book Reviews , 1893, Buffalo Medical and Surgical Journal.

[2]  Stefano Leonardi,et al.  Enclosing a Set of Objects by Two Minimum Area Rectangles , 1996, J. Algorithms.

[3]  Hans-Peter Kriegel,et al.  The pyramid-technique: towards breaking the curse of dimensionality , 1998, SIGMOD '98.

[4]  Hanan Samet,et al.  Ranking in Spatial Databases , 1995, SSD.

[5]  Hans-Peter Kriegel,et al.  The Performance of Object Decomposition Techniques for Spatial Query Processing , 1991, SSD.

[6]  Ramesh C. Jain,et al.  Similarity indexing with the SS-tree , 1996, Proceedings of the Twelfth International Conference on Data Engineering.

[7]  Shin'ichi Satoh,et al.  The SR-tree: an index structure for high-dimensional nearest neighbor queries , 1997, SIGMOD '97.

[8]  Christian Böhm,et al.  Independent quantization: an index compression technique for high-dimensional data spaces , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[9]  Raimund Seidel,et al.  Linear programming and convex hulls made easy , 1990, SCG '90.

[10]  George B. Dantzig,et al.  Linear programming and extensions , 1965 .

[11]  Nick Roussopoulos,et al.  Nearest neighbor queries , 1995, SIGMOD '95.

[12]  Hans-Peter Kriegel,et al.  The X-tree : An Index Structure for High-Dimensional Data , 2001, VLDB.

[13]  S. Arya Nearest neighbor searching and applications , 1996 .

[14]  Yannis Manolopoulos,et al.  Performance of Nearest Neighbor Queries in R-Trees , 1997, ICDT.

[15]  Herbert Edelsbrunner,et al.  Algorithms in Combinatorial Geometry , 1987, EATCS Monographs in Theoretical Computer Science.

[16]  Hans-Peter Kriegel,et al.  Optimal multi-step k-nearest neighbor search , 1998, SIGMOD '98.

[17]  I. G. Gowda,et al.  Dynamic Voronoi diagrams , 1983, IEEE Trans. Inf. Theory.

[18]  Michael Ian Shamos,et al.  Computational geometry: an introduction , 1985 .

[19]  Peter Widmayer,et al.  Enclosing Many Boxes by an Optimal Pair of Boxes , 1992, STACS.

[20]  George B. Dantzig,et al.  Linear programming and extensions , 1965 .

[21]  Brian K. Shoichet,et al.  Molecular docking using shape descriptors , 1992 .

[22]  Christian Böhm,et al.  Fast parallel similarity search in multimedia databases , 1997, SIGMOD '97.

[23]  Hans-Peter Kriegel,et al.  The R*-tree: an efficient and robust access method for points and rectangles , 1990, SIGMOD '90.

[24]  Christian Böhm,et al.  Improving the Query Performance of High-Dimensional Index Structures by Bulk-Load Operations , 1998, EDBT.

[25]  Hans-Peter Kriegel,et al.  Comparison of approximations of complex objects used for approximation-based query processing in spatial database systems , 1993, Proceedings of IEEE 9th International Conference on Data Engineering.

[26]  K. Wakimoto,et al.  Efficient and Effective Querying by Image Content , 1994 .

[27]  Christian Böhm,et al.  A cost model for nearest neighbor search in high-dimensional data space , 1997, PODS.

[28]  Ramesh C. Jain,et al.  Similarity indexing: algorithms and performance , 1996, Electronic Imaging.

[29]  SeegerBernhard,et al.  The R*-tree: an efficient and robust access method for points and rectangles , 1990 .

[30]  Rajiv Mehrotra,et al.  Feature-based retrieval of similar shapes , 1993, Proceedings of IEEE 9th International Conference on Data Engineering.

[31]  J. T. Robinson,et al.  The K-D-B-tree: a search structure for large multidimensional dynamic indexes , 1981, SIGMOD '81.

[32]  Hans-Peter Kriegel,et al.  Query Processing of Spatial Objects: Complexity versus Redundancy , 1993, SSD.

[33]  H. V. Jagadish,et al.  A retrieval technique for similar shapes , 1991, SIGMOD '91.

[34]  Klaus Ritter,et al.  Linear Programming: Active Set Analysis and Computer Programs , 1985 .

[35]  Harpreet Sawhney,et al.  Efficient color histogram indexing , 1994, Proceedings of 1st International Conference on Image Processing.

[36]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.