Data Partitioning for Parallel Spatial Join Processing

The cost of spatial join processing can be very high because of the large sizes of spatial objects and the computation-intensive spatial operations. While parallel processing seems a natural solution to this problem, it is not clear how spatial data can be partitioned for this purpose. Various spatial data partitioning methods are examined in this paper. A framework combining the data-partitioning techniques used by most parallel join algorithms in relational databases and the filter-and-refine strategy for spatial operation processing is proposed for parallel spatial join processing. Object duplication caused by multi-assignment in spatial data partitioning can result in extra CPU cost as well as extra communication cost. We find that the key to overcome this problem is to preserve spatial locality in task decomposition. We show in this paper that a near-optimal speedup can be achieved for parallel spatial join processing using our new algorithms.

[1]  Beng Chin Ooi,et al.  Spatial Join Strategies in Distributed Spatial DBMS , 1995, SSD.

[2]  David J. DeWitt,et al.  Practical Skew Handling in Parallel Joins , 1992, VLDB.

[3]  Michael Stonebraker,et al.  The SEQUOIA 2000 storage benchmark , 1993, SIGMOD '93.

[4]  Oliver Günther,et al.  Efficient computation of spatial joins , 1993, Proceedings of IEEE 9th International Conference on Data Engineering.

[5]  Ellis Horowitz,et al.  Fundamentals of Computer Algorithms , 1978 .

[6]  Masaru Kitsuregawa,et al.  Bucket Spreading Parallel Hash: A New, Robust, Parallel Hash Join Method for Data Skew in the Super Database Computer (SDC) , 1990, VLDB.

[7]  Hans-Peter Kriegel,et al.  Parallel processing of spatial joins using R-trees , 1996, Proceedings of the Twelfth International Conference on Data Engineering.

[8]  Hongjun Lu,et al.  Query Processing in Parallel Relational Database Systems , 1994 .

[9]  J. L. Smith,et al.  A data structure and algorithm based on a linear key for a rectangle retrieval problem , 1983, Comput. Vis. Graph. Image Process..

[10]  Ming-Ling Lo,et al.  Spatial hash-joins , 1996, SIGMOD '96.

[11]  Frank Manola,et al.  PROBE Spatial Data Modeling and Query Processing in an Image Database Application , 1988, IEEE Trans. Software Eng..

[12]  Kien A. Hua,et al.  Handling Data Skew in Multiprocessor Database Computers Using Partition Tuning , 1991, VLDB.

[13]  David J. DeWitt,et al.  Parallel database systems: the future of high performance database systems , 1992, CACM.

[14]  Ralf Hartmut Güting,et al.  An introduction to spatial database systems , 1994, VLDB J..

[15]  Hans-Peter Kriegel,et al.  Efficient processing of spatial joins using R-trees , 1993, SIGMOD Conference.

[16]  Oliver Giinther,et al.  Efficient Computation of Spatial Joins , 1993 .

[17]  Ming-Ling Lo,et al.  Spatial joins using seeded trees , 1994, SIGMOD '94.

[18]  Christos Faloutsos,et al.  The R+-Tree: A Dynamic Index for Multi-Dimensional Objects , 1987, VLDB.

[19]  David J. DeWitt,et al.  Partition based spatial-merge join , 1996, SIGMOD '96.

[20]  Xiaofang Zhou,et al.  Parallel processing in relational database systems , 1994 .

[21]  Michael Ian Shamos,et al.  Computational geometry: an introduction , 1985 .

[22]  Hanan Samet,et al.  Performance of Data-Parallel Spatial Operations , 1994, VLDB.

[23]  Antonin Guttman,et al.  R-trees: a dynamic index structure for spatial searching , 1984, SIGMOD '84.