A cost model for estimating the performance of spatial joins using R-trees

The development of a cost model for predicting the performance of spatial joins has been identified in the literature as an important and difficult problem. The authors present the first cost model that can predict the performance of spatial joins using R-trees. Based on two existing R-trees (join targets), the model first estimates the number of expected I/Os for the join process by assuming a zero buffer size. The method for this estimation extends the cost model for R-tree window queries (developed by Kamel and Faloutsos (1993) and by Pagel et al. (1993)) to also handle spatial joins (which are more complex). In the context of spatial join processing, this number of zero-buffer expected I/Os is not practical for performance prediction in a buffered environment. To model the buffer impact, they use an (exponential) distribution function to measure the probability that a bufferless I/O would cause a page fault in a buffered environment. Based on this probability and the zero-buffer expected I/O cost, the estimated number of I/Os for an R-tree join can then be computed. The comparisons between the predictions from the cost model and the actual results from the experiments based on real GIS maps show that the average relative error ratio is about 10% with a maximum of about 20% for a wide range of buffer sizes. Therefore, our model is a useful tool for the query optimization of spatial join queries.

[1]  David J. DeWitt,et al.  Partition based spatial-merge join , 1996, SIGMOD '96.

[2]  Mike Tanner Practical Queueing Analysis , 1995 .

[3]  Antonin Guttman,et al.  R-trees: a dynamic index structure for spatial searching , 1984, SIGMOD '84.

[4]  Christos Faloutsos,et al.  The A dynamic index for multidimensional ob-jects , 1987, Very Large Data Bases Conference.

[5]  Hans-Peter Kriegel,et al.  Efficient processing of spatial joins using R-trees , 1993, SIGMOD Conference.

[6]  Jiawei Han,et al.  Distance-associated join indices for spatial range search , 1992, [1992] Eighth International Conference on Data Engineering.

[7]  Elke A. Rundensteiner,et al.  Spatial Joins Using R-trees: Breadth-First Traversal with Global Optimizations , 1997, VLDB.

[8]  Jürg Nievergelt,et al.  The Grid File: An Adaptable, Symmetric Multikey File Structure , 1984, TODS.

[9]  Elke A. Rundensteiner,et al.  Integrated query processing strategies for spatial path queries , 1997, Proceedings 13th International Conference on Data Engineering.

[10]  Ming-Ling Lo,et al.  Spatial joins using seeded trees , 1994, SIGMOD '94.

[11]  Hans-Peter Kriegel,et al.  The R*-tree: an efficient and robust access method for points and rectangles , 1990, SIGMOD '90.

[12]  Christos Faloutsos,et al.  On packing R-trees , 1993, CIKM '93.

[13]  Bernd-Uwe Pagel,et al.  Towards an analysis of range query performance in spatial data structures , 1993, PODS '93.

[14]  Michael Stonebraker,et al.  The SEQUOIA 2000 storage benchmark , 1993, SIGMOD '93.

[15]  Doron Rotem Spatial join indices , 1991, [1991] Proceedings. Seventh International Conference on Data Engineering.

[16]  Christos Faloutsos,et al.  Analysis of object oriented spatial access methods , 1987, SIGMOD '87.

[17]  A. Guttmma,et al.  R-trees: a dynamic index structure for spatial searching , 1984 .

[18]  Ming-Ling Lo,et al.  Spatial hash-joins , 1996, SIGMOD '96.

[19]  Timos K. Sellis,et al.  A model for the prediction of R-tree performance , 1996, PODS.

[20]  Oliver Günther Efficient Computation of Spatial Joins , 1993, ICDE.

[21]  Michael Stonebraker,et al.  The Implementation of Postgres , 1990, IEEE Trans. Knowl. Data Eng..

[22]  Elke A. Rundensteiner,et al.  Improving Spatial Intersect Joins Using Symbolic Intersect Detection , 1997, SSD.

[23]  Christos Faloutsos,et al.  Beyond uniformity and independence: analysis of R-trees using the concept of fractal dimension , 1994, PODS.

[24]  Jack A. Orenstein Spatial query processing in an object-oriented database system , 1986, SIGMOD '86.

[25]  Michael Ubell,et al.  The Montage extensible DataBlade architecture , 1994, SIGMOD '94.