Data Partitioning for Parallel Spatial Join Processing

The cost of spatial join processing can be very high because of the large sizes of spatial objects and the computation-intensive spatial operations. While parallel processing seems a natural solution to this problem, it is not clear how spatial data can be partitioned for this purpose. Various spatial data partitioning methods are examined in this paper. A framework combining the data-partitioning techniques used by most parallel join algorithms in relational databases and the filter-and-refine strategy for spatial operation processing is proposed for parallel spatial join processing. Object duplication caused by multi-assignment in spatial data partitioning can result in extra CPU cost as well as extra communication cost. We find that the key to overcome this problem is to preserve spatial locality in task decomposition. In this paper we show that a near-optimal speedup can be achieved for parallel spatial join processing using our new algorithms.

[1]  Christos Faloutsos,et al.  The R+-Tree: A Dynamic Index for Multi-Dimensional Objects , 1987, VLDB.

[2]  Kien A. Hua,et al.  Handling Data Skew in Multiprocessor Database Computers Using Partition Tuning , 1991, VLDB.

[3]  Masaru Kitsuregawa,et al.  Bucket Spreading Parallel Hash: A New, Robust, Parallel Hash Join Method for Data Skew in the Super Database Computer (SDC) , 1990, VLDB.

[4]  Beng Chin Ooi,et al.  Spatial Join Strategies in Distributed Spatial DBMS , 1995, SSD.

[5]  David J. DeWitt,et al.  Practical Skew Handling in Parallel Joins , 1992, VLDB.

[6]  Michael Stonebraker,et al.  The SEQUOIA 2000 storage benchmark , 1993, SIGMOD '93.

[7]  Xiaofang Zhou,et al.  Parallel processing in relational database systems , 1994 .

[8]  Michael Ian Shamos,et al.  Computational geometry: an introduction , 1985 .

[9]  Hanan Samet,et al.  Performance of Data-Parallel Spatial Operations , 1994, VLDB.

[10]  Oliver Günther,et al.  Efficient computation of spatial joins , 1993, Proceedings of IEEE 9th International Conference on Data Engineering.

[11]  Ming-Ling Lo,et al.  Spatial joins using seeded trees , 1994, SIGMOD '94.

[12]  Ellis Horowitz,et al.  Fundamentals of Computer Algorithms , 1978 .

[13]  Hans-Peter Kriegel,et al.  Parallel processing of spatial joins using R-trees , 1996, Proceedings of the Twelfth International Conference on Data Engineering.

[14]  David J. DeWitt,et al.  Partition based spatial-merge join , 1996, SIGMOD '96.

[15]  Hongjun Lu,et al.  Query Processing in Parallel Relational Database Systems , 1994 .

[16]  Antonin Guttman,et al.  R-trees: a dynamic index structure for spatial searching , 1984, SIGMOD '84.

[17]  Frank Manola,et al.  PROBE Spatial Data Modeling and Query Processing in an Image Database Application , 1988, IEEE Trans. Software Eng..

[18]  Hans-Peter Kriegel,et al.  Efficient processing of spatial joins using R-trees , 1993, SIGMOD Conference.

[19]  J. L. Smith,et al.  A data structure and algorithm based on a linear key for a rectangle retrieval problem , 1983, Comput. Vis. Graph. Image Process..

[20]  Ming-Ling Lo,et al.  Spatial hash-joins , 1996, SIGMOD '96.

[21]  David J. DeWitt,et al.  Parallel database systems: the future of high performance database systems , 1992, CACM.

[22]  Ralf Hartmut Güting,et al.  An introduction to spatial database systems , 1994, VLDB J..