Accelerating Cross-Matching Operation of Geospatial Datasets using a CPU-GPU Hybrid Platform

Spatial cross-matching operation over geospatial polygonal datasets is important to a variety of GIS applications. However, it involves extensive computation cost associated with intersection and union of a geospatial polygon pair from large scale datasets. This mandates for exploration of parallel computing capabilities such as GPU to increase the efficiency of such operations. In this paper, we present a CPU-GPU hybrid platform to accelerate the cross-matching operation of geospatial datasets. The computing tasks are dynamically scheduled to be executed either on CPU or GPU. To accommodate geospatial datasets processing on GPU using pixelization approach, we convert the floating point-valued vertices into integer-valued vertices with an adaptive scaling factor as a function of area of minimum bounding box. We test our framework over Natural Earth Dataset and achieve 10x speedup on NVIDIA GeForce GTX750 GPU and 14x speedup on Tesla K80 GPU over 280,000 polygon pairs in one tile and 400 tiles in total. We also investigate the effects of input data size to the IO / computation ratio and note that the sufficiently large input data size is required to better utilize the computing power of GPU. Finally, with comparison between two GPUs, our results demonstrate that the efficient cross-matching comparison can be achieved with a cost-effective GPU.

[1]  Fusheng Wang,et al.  SATO: a spatial data partitioning framework for scalable query processing , 2014, SIGSPATIAL/GIS.

[2]  Fusheng Wang,et al.  High performance spatial queries for spatial big data: from medical imaging to GIS , 2015, SIGSPACIAL.

[3]  Yufei Tao,et al.  Location-based spatial queries , 2003, SIGMOD '03.

[4]  Ahmed Eldawy,et al.  SpatialHadoop: towards flexible and scalable spatial processing using mapreduce , 2014, SIGMOD'14 PhD Symposium.

[5]  Jianting Zhang,et al.  Speeding up large-scale point-in-polygon test based spatial join on GPUs , 2012, BigSpatial '12.

[6]  Le Gruenwald,et al.  Large-scale spatial data processing on GPUs and GPU-accelerated clusters , 2015, SIGSPACIAL.

[7]  Jacek Malczewski,et al.  Quality Evaluation of Volunteered Geographic Information: The Case of OpenStreetMap , 2017 .

[8]  Minyi Guo,et al.  Simba: Efficient In-Memory Spatial Analytics , 2016, SIGMOD Conference.

[9]  Patrick Valduriez,et al.  Principles of Distributed Database Systems , 1990 .

[10]  Prabhas Chongstitvatana,et al.  Spatial Join with R-Tree on Graphics Processing Units , 2013 .

[11]  Joel H. Saltz,et al.  Accelerating Pathology Image Data Cross-Comparison on CPU-GPU Hybrid Systems , 2012, Proc. VLDB Endow..

[12]  David W. Adler DB2 Spatial Extender - Spatial data within the RDBMS , 2001, VLDB.

[13]  Shashi Shekhar,et al.  A vision for GPU-accelerated parallel computation on geo-spatial datasets , 2015, SIGSPACIAL.

[14]  Divyakant Agrawal,et al.  MD-HBase: A Scalable Multi-dimensional Data Infrastructure for Location Aware Services , 2011, 2011 IEEE 12th International Conference on Mobile Data Management.

[15]  Joel H. Saltz,et al.  SparkGIS: Resource Aware Efficient In-Memory Spatial Query Processing , 2017, SIGSPATIAL/GIS.

[16]  Sushil K. Prasad,et al.  Polygonal Overlay Computation on Cloud, Hadoop, and MPI , 2017, Encyclopedia of GIS.

[17]  Joel H. Saltz,et al.  Hadoop-GIS: A High Performance Spatial Data Warehousing System over MapReduce , 2013, Proc. VLDB Endow..

[18]  Yeh-Ching Chung,et al.  A Parallel Rectangle Intersection Algorithm on GPU+CPU , 2011, 2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.

[19]  Suprio Ray,et al.  Surveying the landscape: an in-depth analysis of spatial database workloads , 2012, SIGSPATIAL/GIS.

[20]  Qiong Luo,et al.  Multi-Assignment Single Joins for Parallel Cross-Match of Astronomic Catalogs on Heterogeneous Clusters , 2016, SSDBM.

[21]  Fusheng Wang,et al.  Haggis: turbocharge a MapReduce based spatial data warehousing system with GPU engine , 2014, BigSpatial '14.

[22]  Jun Kong,et al.  Scalable 3D spatial queries for analytical pathology imaging with MapReduce , 2016, SIGSPATIAL/GIS.

[23]  Mohamed Sarwat,et al.  GeoSpark: a cluster computing framework for processing large-scale spatial data , 2015, SIGSPATIAL/GIS.

[24]  Le Gruenwald,et al.  Large-scale spatial join query processing in Cloud , 2015, 2015 31st IEEE International Conference on Data Engineering Workshops.

[25]  Le Gruenwald,et al.  Lightweight Distributed Execution Engine for Large-Scale Spatial Join Query Processing , 2015, 2015 IEEE International Congress on Big Data.

[26]  Le Gruenwald,et al.  High-performance polyline intersection based spatial join on GPU-accelerated clusters , 2016, BigSpatial '16.

[27]  Joel H. Saltz,et al.  SparkGIS: Efficient Comparison and Evaluation of Algorithm Results in Tissue Image Analysis Studies , 2015, Big-O/DMAH@VLDB.