论文信息 - In-Memory Spatial Join: The Data Matters!

In-Memory Spatial Join: The Data Matters!

A spatial join computes all pairs of spatial objects in two data sets satisfying a distance constraint. An increasing demand in applications ranging from human brain analysis to transportation data analysis motivates studies on designing new in-memory spatial join algorithms. Among recent proposals, the following six algorithms can efficiently perform in-memory spatial joins: Size Separation Spatial Join (S3), Spatial Grid Hash join (SGrid), TOUCH, Partition Based Spatial-Merge Join (PBSM), Plane-Sweep Join (PS), and Nested-Loop Join (NL). This paper addresses the need for studies of aspects that may influence the performance of spatial join algorithms. In particular, given two datasets, A and B, the following aspects may affect performance: the datasets being real or synthetic data, the distributions with respect to density and location of the datasets, and the order of performing the spatial join (A 1 B or B 1 A). To study the effects on performance of these aspects, we implement the six spatial join algorithms in a single framework and conduct extensive experiments. The findings show that the data being real or synthetic, the data distribution, and the join order can influence substantially the performance of the algorithms. We present detailed findings that offer insight into different facets of each algorithm and that enable comparison across algorithms and datasets. Furthermore, we provide advice on choosing among the spatial join algorithms based on the empirical evaluation.

Christian S. Jensen | Sadegh Heyrani-Nobari | Qiang Qu

[1] Margaret H. Dunham,et al. Join processing in relational databases , 1992, CSUR.

[2] Johannes Gehrke,et al. An Experimental Analysis of Iterated Spatial Joins in Main Memory , 2013, Proc. VLDB Endow..

[3] Thomas Heinis,et al. TOUCH: in-memory spatial join by hierarchical data-oriented partitioning , 2013, SIGMOD '13.

[4] Nick Koudas,et al. Size separation spatial join , 1997, SIGMOD '97.

[5] Siyuan Liu,et al. Rationality Analytics from Trajectories , 2015, ACM Trans. Knowl. Discov. Data.

[6] Stéphane Bressan,et al. L-opacity: Linkage-Aware Graph Anonymization , 2014, EDBT.

[7] David J. DeWitt,et al. Partition based spatial-merge join , 1996, SIGMOD '96.

[8] Ming-Ling Lo,et al. Spatial hash-joins , 1996, SIGMOD '96.

[9] Behrouz Minaei-Bidgoli,et al. ROLL: Fast In-Memory Generation of Gigantic Scale-free Networks , 2016, SIGMOD Conference.

[10] Jens Teubner,et al. Low-Latency Handshake Join , 2014, Proc. VLDB Endow..

[11] Stéphane Bressan,et al. Fast random graph generation , 2011, EDBT/ICDT '11.

[12] Christian S. Jensen,et al. Spatial Joins in Main Memory: Implementation Matters! , 2014, Proc. VLDB Endow..

[13] Michael Ian Shamos,et al. Computational geometry: an introduction , 1985 .

[14] Thomas Heinis,et al. Computational Neuroscience Breakthroughs through Innovative Data Management , 2013, ADBIS.

[15] Thanh-Tung Cao,et al. Scalable parallel minimum spanning forest computation , 2012, PPoPP '12.