Data Parallel Quadtree Indexing and Spatial Query Processing of Complex Polygon Data on GPUs

Fast growing computing power on commodity parallel hardware makes it both an opportunity and a challenge to use modern hardware for large-scale data management. While GPU (Graphics Processing Unit) computing is conceptually an excellent match for spatial data management which is both data and computing intensive, the complexity of multi-dimensional spatial indexing and query processing techniques has made it difficult to port existing serial algorithms to GPUs. In this study, we propose a parallel primitives based strategy for spatial data management. We present data parallel designs for polygon decomposition, quadtree construction and spatial query processing. These designs can be realized on both GPUs and multi-core CPUs as well as future generation hardware when parallel libraries that support the primitives are available. Using a large-scale geo-referenced species distribution dataset as an example, the GPU-based implementations can achieve up to 190X speedups over serial CPU implementations and 14X speedups over 16-core CPU implementations for polygon decomposition, which is the most computing intensive module in the end-to-end spatial data management solution we have provided. For quadtree constructions and spatial range/polygon query modules, which are more data intensive, the speedups over single and multi-core CPUs are up to 27X and 2X, respectively, depending on workloads. Comparing with a similar technique on polygon decomposition that is realized using a native parallel programming language, our parallel primitives based implementation is up to 3X faster on the species distribution dataset. The results may suggest that simplicity and efficiency can be achieved simultaneously using the data parallel design strategy by identifying the inherent data parallelisms in application domains.

[1]  Simin You GPU-based Spatial Indexing and Query Processing Using R-Trees , 2012 .

[2]  F. Bisby The quiet revolution: biodiversity informatics and the internet. , 2000, Science.

[3]  H SaltzJoel,et al.  Accelerating pathology image data cross-comparison on CPU-GPU hybrid systems , 2012, VLDB 2012.

[4]  Johannes Gehrke,et al.  An Experimental Analysis of Iterated Spatial Joins in Main Memory , 2013, Proc. VLDB Endow..

[5]  J. O´Rourke,et al.  Computational Geometry in C: Arrangements , 1998 .

[6]  De Giusti,et al.  Structured Parallel Programming: patterns for efficient computation , 2015 .

[7]  Hanan Samet,et al.  Foundations of multidimensional and metric data structures , 2006, Morgan Kaufmann series in data management systems.

[8]  Ahmed Eldawy,et al.  CG_Hadoop: computational geometry in MapReduce , 2013, SIGSPATIAL/GIS.

[9]  Moustafa Ghanem,et al.  Structured parallel programming , 1993, Proceedings of Workshop on Programming Models for Massively Parallel Computers.

[10]  Thomas Heinis,et al.  TOUCH: in-memory spatial join by hierarchical data-oriented partitioning , 2013, SIGMOD '13.

[11]  Joel H. Saltz,et al.  Hadoop-GIS: A High Performance Spatial Data Warehousing System over MapReduce , 2013, Proc. VLDB Endow..

[12]  Frank Warmerdam,et al.  The Geospatial Data Abstraction Library , 2008 .

[13]  Jianting Zhang,et al.  High-performance quadtree constructions on large-scale geospatial rasters using GPGPU parallel primitives , 2013, Int. J. Geogr. Inf. Sci..

[14]  Samuli Laine,et al.  High-performance software rasterization on GPUs , 2011, HPG '11.

[15]  Yi Fang,et al.  Spatial indexing in microsoft SQL server 2008 , 2008, SIGMOD Conference.

[16]  Michael Gertz,et al.  Efficiently managing large-scale raster species distribution data in PostgreSQL , 2009, GIS.

[17]  Shashi Shekhar,et al.  Spatial Databases: A Tour , 2003 .

[18]  Gustavo Alonso Hardware killed the software star , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[19]  Regina O. Obe,et al.  PostGIS in Action , 2011 .

[20]  Joel H. Saltz,et al.  Accelerating Pathology Image Data Cross-Comparison on CPU-GPU Hybrid Systems , 2012, Proc. VLDB Endow..

[21]  Le Gruenwald,et al.  Parallel online spatial and temporal aggregations on multi-core CPUs and many-core GPUs , 2014, Inf. Syst..

[22]  Clifford A. Shaffer,et al.  QUILT: a geographic information system based on quadtrees , 1990, Int. J. Geogr. Inf. Sci..

[23]  Suman Nath,et al.  Generic and efficient framework for search trees on flash memory storage systems , 2013, GeoInformatica.

[24]  Le Gruenwald,et al.  Embedding and extending GIS for exploratory analysis of large-scale species distribution data , 2008, GIS '08.

[25]  Jianting Zhang,et al.  Speeding up large-scale point-in-polygon test based spatial join on GPUs , 2012, BigSpatial '12.

[26]  Jianting Zhang,et al.  A high-performance web-based information system for publishing large-scale species range maps in support of biodiversity studies , 2012, Ecol. Informatics.

[27]  John D. Hey,et al.  AN EXPERIMENTAL ANALYSIS , 2004 .

[28]  Hanan Samet,et al.  Spatial join techniques , 2007, TODS.

[29]  Joseph O'Rourke,et al.  Computational Geometry in C. , 1995 .