Processing Distance-Based Queries in Multidimensional Data Spaces Using R-trees

In modern database applications the similarity, or dissimilarity of data objects is examined by performing distance-based queries (DBQs) on multidimensional data. The R-tree and its variations are commonly cited multidimensional access methods. In this paper, we investigate the performance of the most representative distance-based queries in multidimensional data spaces, where the point datasets are indexed by tree-like structures belonging to the R-tree family. In order to perform the K-nearest neighbor query (K-NNQ) and the K-closest pair query (K-CPQ), non-incremental recursive branch-and-bound algorithms are employed. The K-CPQ is shown to be a very expensive query for datasets of high cardinalities that becomes even more costly as the dimensionality increases. We also give ɛ-approximate versions of DBQ algorithms that can be performed faster than the exact ones, at the expense of introducing a distance relative error of the result. Experimentation with synthetic multidimensional point datasets, following Uniform and Gaussian distributions, reveals that the best index structure for K-NNQ is the X-tree. However, for K-CPQ, th e R*-tree outperforms th e X-tree in respect to the response time and the number of disk accesses, when an LRU buffer is used. Moreover, the application of the ɛ-approximate technique on the recursive K-CPQ algorithm leads to acceptable approximations of the result quickly, although the tradeoff between cost and accuracy cannot be easily controlled by the users.

[1]  Marco Patella,et al.  PAC nearest neighbor queries: Approximate and controlled search in high-dimensional and metric spaces , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[2]  Hanan Samet,et al.  Incremental distance join algorithms for spatial databases , 1998, SIGMOD '98.

[3]  Shin'ichi Satoh,et al.  The SR-tree: an index structure for high-dimensional nearest neighbor queries , 1997, SIGMOD '97.

[4]  H. V. Jagadish,et al.  A retrieval technique for similar shapes , 1991, SIGMOD '91.

[5]  Ada Wai-Chee Fu,et al.  Enhanced nearest neighbour search on the R-tree , 1998, SGMD.

[6]  Hanan Samet,et al.  Ranking in Spatial Databases , 1995, SSD.

[7]  Hans-Peter Kriegel,et al.  The X-tree : An Index Structure for High-Dimensional Data , 2001, VLDB.

[8]  Michael Vassilakopoulos,et al.  Approximate Algorithms for Distance-Based Queries in High-Dimensional Data Spaces Using R-Trees , 2002, ADBIS.

[9]  Antonin Guttman,et al.  R-trees: a dynamic index structure for spatial searching , 1984, SIGMOD '84.

[10]  Nick Koudas,et al.  High dimensional similarity joins: algorithms and performance evaluation , 1998, Proceedings 14th International Conference on Data Engineering.

[11]  Nick Roussopoulos,et al.  Nearest neighbor queries , 1995, SIGMOD '95.

[12]  Christos Faloutsos,et al.  Fast Nearest Neighbor Search in Medical Image Databases , 1996, VLDB.

[13]  Michael Ian Shamos,et al.  Computational geometry: an introduction , 1985 .

[14]  Jonathan Goldstein,et al.  When Is ''Nearest Neighbor'' Meaningful? , 1999, ICDT.

[15]  Oliver Günther,et al.  Multidimensional access methods , 1998, CSUR.

[16]  Sunil Arya,et al.  An optimal algorithm for approximate nearest neighbor searching fixed dimensions , 1998, JACM.

[17]  Sukho Lee,et al.  Adaptive multi-stage distance join processing , 2000, SIGMOD 2000.

[18]  Hans-Peter Kriegel,et al.  Efficiently supporting multiple similarity queries for mining in metric databases , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[19]  Masatoshi Yoshikawa,et al.  The A-tree: An Index Structure for High-Dimensional Spaces Using Relative Approximation , 2000, VLDB.

[20]  Mark S. Boddy,et al.  An Analysis of Time-Dependent Planning , 1988, AAAI.

[21]  Yannis Manolopoulos,et al.  Closest pair queries in spatial databases , 2000, SIGMOD 2000.

[22]  Hans-Peter Kriegel,et al.  The R*-tree: an efficient and robust access method for points and rectangles , 1990, SIGMOD '90.

[23]  Hanan Samet,et al.  Distance browsing in spatial databases , 1999, TODS.