Scalable spatial query processing relies on effective spatial data partitioning for query parallelization, data pruning, and load balancing. These are often challenged by the intrinsic characteristics of spatial data, such as high skew in data distribution and high complexity of irregular multi-dimensional objects. In this demo, we present SATO, a spatial data partitioning framework that can quickly analyze and partition spatial data with an optimal spatial partitioning strategy for scalable query processing. SATO works in following steps: 1) Sample, which samples a small fraction of input data for analysis, 2) Analyze, which quickly analyzes sampled data to find an optimal partition strategy, 3) Tear, which provides data skew aware partitioning and supports MapReduce based scalable partitioning, and 4) Optimize, which collects succinct partition statistics for potential query optimization. SATO also provides multiple level partitioning, which can be used to significantly improve window based queries in cloud based spatial query processing systems. SATO comes with a visualization component that provides heat maps and histograms for qualitative evaluation. SATO has been implemented within the Hadoop-GIS, a high performance spatial data warehousing system over MapReduce. SATO is also released as an independent software package to support various scalable spatial query processing systems. Our experiments have demonstrated that SATO can generate much balanced partitioning that can significantly improve spatial query performance with MapReduce comparing to traditional spatial partitioning approaches.
[1]
Sridhar Ramaswamy,et al.
Selectivity estimation in spatial databases
,
1999,
SIGMOD '99.
[2]
Ralf Hartmut Güting,et al.
Parallel SECONDO: Practical and efficient mobility data processing in the cloud
,
2013,
2013 IEEE International Conference on Big Data.
[3]
Joel H. Saltz,et al.
Hadoop-GIS: A High Performance Spatial Data Warehousing System over MapReduce
,
2013,
Proc. VLDB Endow..
[4]
David J. DeWitt,et al.
Building a scaleable geo-spatial DBMS: technology, implementation, and evaluation
,
1997,
SIGMOD '97.
[5]
Johannes Gehrke,et al.
An Experimental Analysis of Iterated Spatial Joins in Main Memory
,
2013,
Proc. VLDB Endow..
[6]
Ryan Johnson,et al.
A parallel spatial data analysis infrastructure for the cloud
,
2013,
SIGSPATIAL/GIS.
[7]
Mario A. López,et al.
STR: a simple and efficient algorithm for R-tree packing
,
1997,
Proceedings 13th International Conference on Data Engineering.
[8]
Ahmed Eldawy,et al.
SpatialHadoop: towards flexible and scalable spatial processing using mapreduce
,
2014,
SIGMOD'14 PhD Symposium.