Enabling point pattern analysis on spatial big data using cloud computing: optimizing and accelerating Ripley’s K function

ABSTRACT Performing point pattern analysis using Ripley’s K function on point events of large size is computationally intensive as it involves massive point-wise comparisons, time-consuming edge effect correction weights calculation, and a large number of simulations. This article presented two strategies to optimize the algorithm for point pattern analysis using Ripley’s K function and utilized cloud computing to further accelerate the optimized algorithm. The first optimization sorted the points on their x and y coordinates and thus narrowed the scope of searching for neighboring points down to a rectangular area around each point in estimating K function. Using the actual study area in computing edge effect correction weights is essential to estimate an unbiased K function, but is very computationally intensive if the study area is of complex shape. The second optimization reused the previously computed weights to avoid repeating expensive weights calculation. The optimized algorithm was then parallelized using Open Multi-Processing (OpenMP) and hybrid Message Passing Interface (MPI)/OpenMP on the cloud computing platform. Performance testing showed that the optimizations effectively accelerated point pattern analysis using K function by a factor of 8 using both the sequential version and the OpenMP-parallel version of the optimized algorithm. While the OpenMP-based parallelization achieved good scalability with respect to the number of CPU cores utilized and the problem size, the hybrid MPI/OpenMP-based parallelization significantly shortened the time for estimating K function and performing simulations by utilizing computing resources on multiple computing nodes. Computational challenge imposed by point pattern analysis tasks on point events of large size involving a large number of simulations can be addressed by utilizing elastic, distributed cloud resources.

[1]  Shashi Shekhar,et al.  Spatial big-data challenges intersecting mobility and cloud computing , 2012, MobiDE '12.

[2]  Brian L. Sullivan,et al.  eBird: Engaging Birders in Science and Conservation , 2011, PLoS biology.

[3]  Shaowen Wang,et al.  CyberGIS: blueprint for integrated and scalable geospatial software ecosystems , 2013, Int. J. Geogr. Inf. Sci..

[4]  Michael F. Goodchild,et al.  Spatial cloud computing: how can the geospatial sciences use and help shape cloud computing? , 2011, Int. J. Digit. Earth.

[5]  J. Burt,et al.  Elementary statistics for geographers , 1995 .

[6]  A-Xing Zhu,et al.  CyberSoLIM: A cyber platform for digital soil mapping , 2016 .

[7]  Qunying Huang,et al.  Utilize cloud computing to support dust storm forecasting , 2013, Int. J. Digit. Earth.

[8]  Richard Wolski,et al.  The Eucalyptus Open-Source Cloud-Computing System , 2009, 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid.

[9]  Wenwen Li,et al.  Constructing gazetteers from volunteered Big Geo-Data based on Hadoop , 2013, Comput. Environ. Urban Syst..

[10]  Michael Allen,et al.  Parallel programming: techniques and applications using networked workstations and parallel computers , 1998 .

[11]  Q. Li,et al.  Using cloud computing to process intensive floating car data for urban traffic surveillance , 2011, Int. J. Geogr. Inf. Sci..

[12]  Qunying Huang,et al.  Evaluating open-source cloud computing solutions for geosciences , 2013, Comput. Geosci..

[13]  M. Goodchild Citizens as sensors: the world of volunteered geography , 2007 .

[14]  Josep Torrellas,et al.  False Sharing ans Spatial Locality in Multiprocessor Caches , 1994, IEEE Trans. Computers.

[15]  Zhenlong Li,et al.  Accelerating Geocomputation with Cloud Computing , 2013 .

[16]  Wenwu Tang,et al.  Massively parallel spatial point pattern analysis: Ripley’s K function accelerated using graphics processing units , 2015, Int. J. Geogr. Inf. Sci..

[17]  Richard Vuduc,et al.  Modern Accelerator Technologies for Geographic Information Science , 2013, Springer US.

[18]  Xin-She Yang,et al.  Introduction to Algorithms , 2021, Nature-Inspired Optimization Algorithms.

[19]  J. Illian,et al.  Ecological information from spatial patterns of plants: insights from point process theory , 2009 .

[20]  Dawn J. Wright,et al.  The emergence of spatial cyberinfrastructure , 2011, Proceedings of the National Academy of Sciences.

[21]  Huiji Gao,et al.  Harnessing the Crowdsourcing Power of Social Media for Disaster Relief , 2011, IEEE Intelligent Systems.

[22]  Steve Dowers,et al.  Towards a HPC Framework for Integrated Processing of Geographical Data: Encapsulating the Complexity of Parallel Algorithms , 2000, Trans. GIS.

[23]  Sheng Wang,et al.  Retrieving and Indexing Spatial Data in the Cloud Computing Environment , 2009, CloudCom.

[24]  Jürgen Symanzik,et al.  Statistical Analysis of Spatial Point Patterns , 2005, Technometrics.

[25]  Dragan Stojanovic,et al.  High-performance computing in GIS: techniques and applications , 2013, Int. J. Reason. based Intell. Syst..

[26]  B. Ripley Tests of 'Randomness' for Spatial Point Patterns , 1979 .

[27]  Patrick Weber,et al.  OpenStreetMap: User-Generated Street Maps , 2008, IEEE Pervasive Computing.

[28]  M. Charlton,et al.  Quantitative geography : perspectives on spatial data analysis by , 2001 .

[29]  GaoHuiji,et al.  Harnessing the Crowdsourcing Power of Social Media for Disaster Relief , 2011 .

[30]  B. Ripley Statistical inference for spatial processes , 1990 .

[31]  Cong Xu,et al.  Performance Evaluation of Parallel Programming in Virtual Machine Environment , 2009, 2009 Sixth IFIP International Conference on Network and Parallel Computing.

[32]  Tao Pei,et al.  A citizen data-based approach to predictive mapping of spatial variation of natural phenomena , 2015, Int. J. Geogr. Inf. Sci..

[33]  Adrian Baddeley,et al.  spatstat: An R Package for Analyzing Spatial Point Patterns , 2005 .

[34]  Bryan C. Pijanowski,et al.  A big data urban growth simulation at a national scale: Configuring the GIS and neural network based Land Transformation Model to run in a High Performance Computing (HPC) environment , 2014, Environ. Model. Softw..

[35]  D. Stoyan,et al.  Statistical Analysis and Modelling of Spatial Point Patterns , 2008 .

[36]  Fernando Gustavo Tinetti,et al.  Parallel programming: techniques and applications using networked workstations and parallel computers. Barry Wilkinson, C. Michael Allen , 2000 .

[37]  Mark Gahegan,et al.  Geospatial Cyberinfrastructure: Past, present and future , 2010, Comput. Environ. Urban Syst..

[38]  Peter J. Diggle,et al.  SPLANCS: spatial point pattern analysis code in S-Plus , 1993 .