A Model for k-Nearest Neighbor Query Cost in Multidimensional Index Structures

The k-nearest neighbor query in multidimensional index structures is one of the most frequently used query types in multimedia databases and geographic information systems. Until now, most of the analytic models are restricted to a particular type of the index structure, for example, the R-Tree and they concentrate on the analysis of the range query. Recently, a cost model [3] was reported for nearest neighbor queries. However, the model considered only 1-nearest neighbor queries rather than k-nearest neighbor queries. In this paper, we present an analytic model for the cost of the k-nearest neighbor query in multidimensional index structures. As a basis of the model, we introduce the concept of the regional average volume and the varying density function. The advantages of our model are in particular as follows: It is applicable to any type of datasets with arbitrary distributions (uniform and non-uniform ones), works for the kas well as 1nearest neighbor query, and is a dynamic analysis method which enables a rapid analysis without requiring a time-consuming simulation of data. To estimate the accuracy of our model, we conducted a various range of experiments on the datasets with various distributions. The results show that our analytic model is accurate for the data sets with non-uniform distributions as well as uniform distributions in low and mid dimensions .

[1]  Christos Faloutsos,et al.  Analysis of n-Dimensional Quadtrees using the Hausdorff Fractal Dimension , 1996, VLDB.

[2]  Chin-Wan Chung,et al.  Analysis of Nearest Neighbor Query Performance in Multidimensional Index Structures , 1997, DEXA.

[3]  Tzi-cker Chiueh,et al.  Content-Based Image Indexing , 1994, VLDB.

[4]  Ambuj K. Singh,et al.  Efficient retrieval for browsing large image databases , 1996, CIKM '96.

[5]  Hanan Samet,et al.  Ranking in Spatial Databases , 1995, SSD.

[6]  Hans-Peter Kriegel,et al.  The R*-tree: an efficient and robust access method for points and rectangles , 1990, SIGMOD '90.

[7]  Chin-Wan Chung,et al.  HG-tree: an index structure for multimedia databases , 1996, Proceedings of the Third IEEE International Conference on Multimedia Computing and Systems.

[8]  Christos Faloutsos,et al.  Beyond uniformity and independence: analysis of R-trees using the concept of fractal dimension , 1994, PODS.

[9]  Hans-Peter Kriegel,et al.  The X-tree : An Index Structure for High-Dimensional Data , 2001, VLDB.

[10]  Christos Faloutsos,et al.  Estimating the Selectivity of Spatial Queries Using the 'Correlation' Fractal Dimension , 1995, VLDB.

[11]  Nick Roussopoulos,et al.  Nearest neighbor queries , 1995, SIGMOD '95.

[12]  Hans-Peter Kriegel,et al.  The Buddy-Tree: An Efficient and Robust Access Method for Spatial Data Base Systems , 1990, VLDB.

[13]  Antonin Guttman,et al.  R-trees: a dynamic index structure for spatial searching , 1984, SIGMOD '84.

[14]  Christian Böhm,et al.  A cost model for nearest neighbor search in high-dimensional data space , 1997, PODS.

[15]  Sergey Brin,et al.  Near Neighbor Search in Large Metric Spaces , 1995, VLDB.

[16]  Timos K. Sellis,et al.  A model for the prediction of R-tree performance , 1996, PODS.

[17]  Dragutin Petkovic,et al.  Query by Image and Video Content: The QBIC System , 1995, Computer.

[18]  Bernd-Uwe Pagel,et al.  Towards an analysis of range query performance in spatial data structures , 1993, PODS '93.

[19]  Clu-istos Foutsos,et al.  Fast subsequence matching in time-series databases , 1994, SIGMOD '94.