Efficient and Scalable Parallel Zonal Statistics on Large-Scale Species Occurrence Data on GPUs

Analyzing how species are distributed on the Earth has been one of t he fundamental questions in the intersections of environmental sciences, g eosciences and biological sciences. With world-wide data contributions, more than 375 mil lion species occurrence records for nearly 1.5 million species have been deposite d to the Global Biodiversity Information Facility (GBIF) data portal. The she er amounts of point and polygon data and the computation-intensive point-in-polygon tests for zonal st atistics for biodiversity studies have imposed significant technical challenges . In this study, we have significantly extended our previous work on parallel primitives bas ed patial joins on commodity Graphics Processing Units (GPUs) and have developed new ef ficient and scalable techniques to enable parallel zonal statistics on the G BIF data completely on GPUs with limited memory capacity. Experiment results have shown that an impressive end-to-end response time under 100 seconds can be achieved for zonal stat istics on the 375+ million species records over 15+ thousand global eco-regions with 4+ million vertices on a single Nvidia Quadro 6000 GPU device. The achieved high perfor mance, which is several orders of magnitude faster than reference s erial implementations using traditional open source geospatial techniques, not only demonstrates the pot ential f GPU computing for large scale geospatial processing, but also makes i nteractive query driven visual exploration of global biodiversity data possible.

[1]  Le Gruenwald,et al.  Parallel online spatial and temporal aggregations on multi-core CPUs and many-core GPUs , 2014, Inf. Syst..

[2]  Simin You GPU-based Spatial Indexing and Query Processing Using R-Trees , 2012 .

[3]  Jie Cheng,et al.  Programming Massively Parallel Processors. A Hands-on Approach , 2010, Scalable Comput. Pract. Exp..

[4]  F. Bisby The quiet revolution: biodiversity informatics and the internet. , 2000, Science.

[5]  Jianting Zhang,et al.  CudaGIS: report on the design and realization of a massive data parallel GIS on GPUs , 2012, IWGS '12.

[6]  Le Gruenwald,et al.  U2STRA: high-performance data management of ubiquitous urban sensing trajectories on GPGPUs , 2012, CDMW '12.

[7]  Kesheng Wu,et al.  Data Parallel Bin-Based Indexing for Answering Queries on Multi-core Architectures , 2009, SSDBM.

[8]  Jianting Zhang,et al.  Speeding up large-scale point-in-polygon test based spatial join on GPUs , 2012, BigSpatial '12.

[9]  Jianting Zhang,et al.  High-performance quadtree constructions on large-scale geospatial rasters using GPGPU parallel primitives , 2013, Int. J. Geogr. Inf. Sci..

[10]  Wencheng Wang,et al.  2D point-in-polygon test by classifying edges into layers , 2005, Comput. Graph..

[11]  Bill Howe,et al.  Client + Cloud: Evaluating Seamless Architectures for Visual Data Analytics in the Ocean Sciences , 2010, SSDBM.

[12]  Joel H. Saltz,et al.  Accelerating Pathology Image Data Cross-Comparison on CPU-GPU Hybrid Systems , 2012, Proc. VLDB Endow..

[13]  Jun-Hai Yong,et al.  A point-in-polygon method based on a quasi-closest point , 2010, Comput. Geosci..

[14]  Le Gruenwald,et al.  Embedding and extending GIS for exploratory analysis of large-scale species distribution data , 2008, GIS '08.

[15]  Juan José Jiménez-Delgado,et al.  A new hierarchical triangle-based point-in-polygon data structure , 2009, Comput. Geosci..

[16]  Carlo Ricotta,et al.  Through the Jungle of Biological Diversity , 2005, Acta biotheoretica.

[17]  Hanan Samet,et al.  Spatial join techniques , 2007, TODS.

[18]  Hanan Samet,et al.  Data-parallel primitives for spatial operations using PM quadtrees , 1995, Proceedings of Conference on Computer Architectures for Machine Perception.

[19]  Howard Jay Siege 12-1-1994 Data Parallel Algorithms , 2013 .

[20]  A. Grimshaw,et al.  High Performance and Scalable Radix Sorting: a Case Study of Implementing Dynamic Parallelism for GPU Computing , 2011, Parallel Process. Lett..

[21]  Peter D. Moore,et al.  Biogeography: An Ecological and Evolutionary Approach , 1974 .

[22]  Jianting Zhang,et al.  A high-performance web-based information system for publishing large-scale species range maps in support of biodiversity studies , 2012, Ecol. Informatics.

[23]  Wencheng Wang,et al.  Point-in-polygon tests by convex decomposition , 2007, Comput. Graph..

[24]  David A. Patterson,et al.  Computer Architecture - A Quantitative Approach, 5th Edition , 1996 .

[25]  Hanan Samet,et al.  Performance of Data-Parallel Spatial Operations , 1994, VLDB.

[26]  Naphtali Rishe,et al.  Experiences on Processing Spatial Data with MapReduce , 2009, SSDBM.

[27]  Pradeep Dubey,et al.  Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU , 2010, ISCA.

[28]  Arch D. Robison,et al.  Structured Parallel Programming: Patterns for Efficient Computation , 2012 .

[29]  Jianting Zhang,et al.  GBD-Explorer: Extending open source java GIS for exploring ecoregion-based biodiversity data , 2007, Ecol. Informatics.

[30]  Siva Ravada,et al.  Topological relationship query processing for complex regions in Oracle Spatial , 2012, SIGSPATIAL/GIS.

[31]  Hanan Samet,et al.  Data-parallel polygonization , 2003, Parallel Comput..

[32]  David M. Theobald GIS concepts and ArcGIS methods , 2000 .