The Design and Implementation of Seeded Trees: An Efficient Method for Spatial Joins

Existing methods for spatial joins require pre-existing spatial indices or other precomputation, but such approaches are inefficient and limited in generality. Operand data sets of spatial joins may not all have precomputed indices, particularly when they are dynamically generated by other selection or join operations. Also, existing spatial indices are mostly designed for spatial selections, and are not always efficient for joins. This paper explores the design and implementation of seeded trees, which are effective for spatial joins and efficient to construct at join time. Seeded trees are R-tree-like structures, but divided into seed levels and grown levels. This structure facilitates using information regarding the join to accelerate the join process, and allows efficient buffer management. In addition to the basic structure and behavior of seeded trees we present techniques for efficient seeded tree construction, a new buffer management strategy to lower I/O costs, and theoretical analysis for choosing algorithmic parameters. We also present methods for reducing space requirements and improving the stability of seeded tree performance with no additional I/O costs. Our performance studies show that the seeded tree method outperforms other tree-based methods by far both in terms of the number disk pages accessed and weighted I/O costs. Further, its performance gain is stable across different input data, and its incurred CPU penalties are also lower.

[1]  Jack A. Orenstein Redundancy in spatial databases , 1989, SIGMOD '89.

[2]  Doron Rotem,et al.  Sampling from spatial databases , 1993, Proceedings of IEEE 9th International Conference on Data Engineering.

[3]  Nick Roussopoulos,et al.  Faloutsos: "the r+- tree: a dynamic index for multidimensional objects , 1987 .

[4]  Ralf Hartmut Güting,et al.  A practical divide-and-conquer algorithm for the rectangle intersection problem , 1987, Inf. Sci..

[5]  Antonin Guttman,et al.  R-trees: a dynamic index structure for spatial searching , 1984, SIGMOD '84.

[6]  Jack A. Orenstein A comparison of spatial query processing techniques for native and parameter spaces , 1990, SIGMOD '90.

[7]  Ming-Ling Lo,et al.  Generating Seeded Trees from Data Sets , 1995, SSD.

[8]  Doron Rotem Spatial join indices , 1991, [1991] Proceedings. Seventh International Conference on Data Engineering.

[9]  T. Bernhardsen Geographic Information Systems: An Introduction , 1999 .

[10]  Christos Faloutsos,et al.  The R+-Tree: A Dynamic Index for Multi-Dimensional Objects , 1987, VLDB.

[11]  Christos Faloutsos,et al.  Analysis of object oriented spatial access methods , 1987, SIGMOD '87.

[12]  Jack A. Orenstein An Algorithm for Computing the Overlay of k-Dimensional Spaces , 1991, SSD.

[13]  Hanan Samet,et al.  The Design and Analysis of Spatial Data Structures , 1989 .

[14]  Jürg Nievergelt,et al.  The Grid File: An Adaptable, Symmetric Multikey File Structure , 1984, TODS.

[15]  Ming-Ling Lo,et al.  Spatial joins using seeded trees , 1994, SIGMOD '94.

[16]  Hans-Peter Kriegel,et al.  The R*-tree: an efficient and robust access method for points and rectangles , 1990, SIGMOD '90.

[17]  Oliver Günther,et al.  Efficient computation of spatial joins , 1993, Proceedings of IEEE 9th International Conference on Data Engineering.

[18]  Patrick Valduriez,et al.  Join indices , 1987, TODS.

[19]  Jiawei Han,et al.  Distance-associated join indices for spatial range search , 1992, [1992] Eighth International Conference on Data Engineering.

[20]  Hans-Peter Kriegel,et al.  Efficient processing of spatial joins using R-trees , 1993, SIGMOD Conference.