Accelerating Spatial Cross-Matching on CPU-GPU Hybrid Platform With CUDA and OpenACC

Spatial cross-matching operation over geospatial polygonal datasets is a highly compute-intensive yet an essential task to a wide array of real-world applications. At the same time, modern computing systems are typically equipped with multiple processing units capable of task parallelization and optimization at various levels. This mandates for the exploration of novel strategies in the geospatial domain focusing on efficient utilization of computing resources, such as CPUs and GPUs. In this paper, we present a CPU-GPU hybrid platform to accelerate the cross-matching operation of geospatial datasets. We propose a pipeline of geospatial subtasks that are dynamically scheduled to be executed on either CPU or GPU. To accommodate geospatial datasets processing on GPU using pixelization approach, we convert the floating point-valued vertices into integer-valued vertices with an adaptive scaling factor as a function of the area of minimum bounding box. We present a comparative analysis of GPU enabled cross-matching algorithm implementation in CUDA and OpenACC accelerated C++. We test our implementations over Natural Earth Data and our results indicate that although CUDA based implementations provide better performance, OpenACC accelerated implementations are more portable and extendable while still providing considerable performance gain as compared to CPU. We also investigate the effects of input data size on the IO / computation ratio and note that a larger dataset compensates for IO overheads associated with GPU computations. Finally, we demonstrate that an efficient cross-matching comparison can be achieved with a cost-effective GPU.

[1]  Joel H. Saltz,et al.  Hadoop-GIS: A High Performance Spatial Data Warehousing System over MapReduce , 2013, Proc. VLDB Endow..

[2]  GruenwaldLe,et al.  Large-scale spatial data processing on GPUs and GPU-accelerated clusters , 2015 .

[3]  E. Hirschorn,et al.  Open Geospatial Consortium , 2004 .

[4]  Yeh-Ching Chung,et al.  A Parallel Rectangle Intersection Algorithm on GPU+CPU , 2011, 2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.

[5]  Suprio Ray,et al.  Surveying the landscape: an in-depth analysis of spatial database workloads , 2012, SIGSPATIAL/GIS.

[6]  Joel H. Saltz,et al.  Accelerating Pathology Image Data Cross-Comparison on CPU-GPU Hybrid Systems , 2012, Proc. VLDB Endow..

[7]  Minyi Guo,et al.  Simba: Efficient In-Memory Spatial Analytics , 2016, SIGMOD Conference.

[8]  Chao Gao,et al.  Accelerating Cross-Matching Operation of Geospatial Datasets using a CPU-GPU Hybrid Platform , 2018, 2018 IEEE International Conference on Big Data (Big Data).

[9]  K. Shadan,et al.  Available online: , 2012 .

[10]  Fusheng Wang,et al.  SATO: a spatial data partitioning framework for scalable query processing , 2014, SIGSPATIAL/GIS.

[11]  Joel H. Saltz,et al.  SparkGIS: Resource Aware Efficient In-Memory Spatial Query Processing , 2017, SIGSPATIAL/GIS.

[12]  David W. Adler DB2 Spatial Extender - Spatial data within the RDBMS , 2001, VLDB.

[13]  Le Gruenwald,et al.  Large-scale spatial data processing on GPUs and GPU-accelerated clusters , 2015, SIGSPACIAL.

[14]  Qiong Luo,et al.  Multi-Assignment Single Joins for Parallel Cross-Match of Astronomic Catalogs on Heterogeneous Clusters , 2016, SSDBM.

[15]  Jianting Zhang,et al.  Speeding up large-scale point-in-polygon test based spatial join on GPUs , 2012, BigSpatial '12.

[16]  Shashi Shekhar,et al.  A vision for GPU-accelerated parallel computation on geo-spatial datasets , 2015, SIGSPACIAL.

[17]  Le Gruenwald,et al.  High-performance polyline intersection based spatial join on GPU-accelerated clusters , 2016, BigSpatial '16.

[18]  Joel H. Saltz,et al.  SparkGIS: Efficient Comparison and Evaluation of Algorithm Results in Tissue Image Analysis Studies , 2015, Big-O/DMAH@VLDB.

[19]  Ahmed Eldawy,et al.  SpatialHadoop: towards flexible and scalable spatial processing using mapreduce , 2014, SIGMOD'14 PhD Symposium.

[20]  Fusheng Wang,et al.  High performance spatial queries for spatial big data: from medical imaging to GIS , 2015, SIGSPACIAL.

[21]  Patrick Valduriez,et al.  Principles of Distributed Database Systems , 1990 .

[22]  Prabhas Chongstitvatana,et al.  Spatial Join with R-Tree on Graphics Processing Units , 2013 .

[23]  Siva Ravada,et al.  Oracle Spatial , 2017, Encyclopedia of GIS.

[24]  Le Gruenwald,et al.  Lightweight Distributed Execution Engine for Large-Scale Spatial Join Query Processing , 2015, 2015 IEEE International Congress on Big Data.

[25]  Sushil K. Prasad,et al.  Polygonal Overlay Computation on Cloud, Hadoop, and MPI , 2017, Encyclopedia of GIS.

[26]  Mohamed Sarwat,et al.  GeoSpark: a cluster computing framework for processing large-scale spatial data , 2015, SIGSPATIAL/GIS.

[27]  Yufei Tao,et al.  Location-based spatial queries , 2003, SIGMOD '03.

[28]  Divyakant Agrawal,et al.  MD-HBase: A Scalable Multi-dimensional Data Infrastructure for Location Aware Services , 2011, 2011 IEEE 12th International Conference on Mobile Data Management.

[29]  Le Gruenwald,et al.  Large-scale spatial join query processing in Cloud , 2015, 2015 31st IEEE International Conference on Data Engineering Workshops.

[30]  Fusheng Wang,et al.  Haggis: turbocharge a MapReduce based spatial data warehousing system with GPU engine , 2014, BigSpatial '14.

[31]  Jun Kong,et al.  Scalable 3D spatial queries for analytical pathology imaging with MapReduce , 2016, SIGSPATIAL/GIS.