TOUCH: in-memory spatial join by hierarchical data-oriented partitioning

Efficient spatial joins are pivotal for many applications and particularly important for geographical information systems or for the simulation sciences where scientists work with spatial models. Past research has primarily focused on disk-based spatial joins; efficient in-memory approaches, however, are important for two reasons: a) main memory has grown so large that many datasets fit in it and b) the in-memory join is a very time-consuming part of all disk-based spatial joins. In this paper we develop TOUCH, a novel in-memory spatial join algorithm that uses hierarchical data-oriented space partitioning, thereby keeping both its memory footprint and the number of comparisons low. Our results show that TOUCH outperforms known in-memory spatial-join algorithms as well as in-memory implementations of disk-based join approaches. In particular, it has a one order of magnitude advantage over the memory-demanding state of the art in terms of number of comparisons (i.e., pairwise object comparisons), as well as execution time, while it is two orders of magnitude faster when compared to approaches with a similar memory footprint. Furthermore, TOUCH is more scalable than competing approaches as data density grows.

[1]  David J. DeWitt,et al.  Partition based spatial-merge join , 1996, SIGMOD '96.

[2]  Dimitris Papadias,et al.  Slot Index Spatial Join , 2003, IEEE Trans. Knowl. Data Eng..

[3]  Bernhard Seeger,et al.  Data redundancy and duplicate detection in spatial join processing , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[4]  Michael Ian Shamos,et al.  Computational geometry: an introduction , 1985 .

[5]  Thanh-Tung Cao,et al.  Scalable parallel minimum spanning forest computation , 2012, PPoPP '12.

[6]  Sridhar Ramaswamy,et al.  Scalable Sweeping-Based Spatial Join , 1998, VLDB.

[7]  Hans-Peter Kriegel,et al.  The R*-tree: an efficient and robust access method for points and rectangles , 1990, SIGMOD '90.

[8]  Christos Faloutsos,et al.  Hilbert R-tree: An Improved R-tree using Fractals , 1994, VLDB.

[9]  Ramez Elmasri,et al.  Fundamentals of Database Systems , 1989 .

[10]  Walid G. Aref,et al.  Hashing by proximity to process duplicates in spatial databases , 1994, CIKM '94.

[11]  Michael Ubell,et al.  The Montage extensible DataBlade architecture , 1994, SIGMOD '94.

[12]  Jun Kong,et al.  A data model and database for high-resolution pathology analytical image informatics , 2011, Journal of pathology informatics.

[13]  Hanan Samet,et al.  Spatial join techniques , 2007, TODS.

[14]  Walid G. Aref,et al.  Cascaded spatial join algorithms with spatially sorted output , 1996, GIS '96.

[15]  Nick Koudas,et al.  Size separation spatial join , 1997, SIGMOD '97.

[16]  Margaret H. Dunham,et al.  Join processing in relational databases , 1992, CSUR.

[17]  Ming-Ling Lo,et al.  Spatial hash-joins , 1996, SIGMOD '96.

[18]  Mark de Berg,et al.  The Priority R-tree: a practically efficient and worst-case optimal R-tree , 2004, SIGMOD '04.

[19]  B. Sakmann,et al.  Improved patch-clamp techniques for high-resolution current recording from cells and cell-free membrane patches , 1981, Pflügers Archiv.

[20]  Mario A. López,et al.  STR: a simple and efficient algorithm for R-tree packing , 1997, Proceedings 13th International Conference on Data Engineering.

[21]  Henry Markram,et al.  Identifying, tabulating, and analyzing contacts between branched neuron morphologies , 2008, IBM J. Res. Dev..

[22]  S Gnanakaran,et al.  Peptide folding simulations. , 2003, Current opinion in structural biology.

[23]  Ming-Ling Lo,et al.  Spatial joins using seeded trees , 1994, SIGMOD '94.

[24]  Jeffrey F. Naughton,et al.  A non-blocking parallel spatial join algorithm , 2002, Proceedings 18th International Conference on Data Engineering.

[25]  Hans-Peter Kriegel,et al.  Efficient processing of spatial joins using R-trees , 1993, SIGMOD Conference.

[26]  Christos Faloutsos,et al.  The R+-Tree: A Dynamic Index for Multi-Dimensional Objects , 1987, VLDB.

[27]  Mario A. López,et al.  A greedy algorithm for bulk loading R-trees , 1998, GIS '98.

[28]  Walid G. Aref,et al.  A Cost Model for Query Optimization Using R-Trees , 1994, ACM-GIS.

[29]  Antonin Guttman,et al.  R-trees: a dynamic index structure for spatial searching , 1984, SIGMOD '84.

[30]  Jack A. Orenstein A comparison of spatial query processing techniques for native and parameter spaces , 1990, SIGMOD '90.