DEVELOPING APACHE SPARK BASED RIPLEY’S K FUNCTIONS FOR ACCELERATING SPATIOTEMPORAL POINT PATTERN ANALYSIS

Abstract. Ripley’s K functions are powerful tools for studying the spatial arrangement or spatiotemporal distribution characteristics of geographic phenomena and events in spatial analysis and has been used in many fields. However, the K functions are compute-intensive for point-wise distance comparisons, edge correction and simulations for significance test. Although parallel computing technologies have been adopted to accelerate K functions, previous works haven’t extended the optimization from space to space-time dimension. This study presents an acceleration method for K functions upon state-of-the-art distributed computing framework Apache Spark, and four optimization strategies are leveraged to simplify calculation procedures and accelerate distributed computing respectively, including 1) spatiotemporal indexing based on R-tree with Sort-Tile-Recursive (STR) algorithm for reducing distance comparison when retrieving potential spatiotemporally neighbouring points; 2) Hash-Table-based caching for spatiotemporal edge correction weights reuse and reducing repetitive computation; 3) Spatiotemporal partitioning using KDB-tree as well as cylinder intersection redundancy strategy for decreasing ghost buffer redundancy in partitions and supporting near-balanced distributed processing; 4) Customized serialization of spatiotemporal objects and indexes for lowering the overhead of data transmission. Experiments verify the effectiveness and time efficiency of the proposed optimization strategies, and also evaluate the overall performance and scalability. Based on the proposed methods, a web-based visual analytics framework has been developed and publicly shared through GitHub, and four types of the distributed K functions are implemented, including space, space-time, local and cross K functions, which demonstrates its value on promoting geographical and socioeconomic studies.

[1]  M. Christman,et al.  Spatial and Temporal Patterns of Commercial Citrus Trees Affected by Phyllosticta citricarpa in Florida , 2017, Scientific Reports.

[2]  Kai Hu,et al.  The Concept and Technologies of Quality of Geographic Information Service: Improving User Experience of GIServices in a Distributed Computing Environment , 2019, ISPRS Int. J. Geo Inf..

[3]  Zhipeng Gui,et al.  A WEB-BASED FRAMEWORK FOR VISUALIZING INDUSTRIAL SPATIOTEMPORAL DISTRIBUTION USING STANDARD DEVIATIONAL ELLIPSE AND SHIFTING ROUTES OF GRAVITY CENTERS , 2017 .

[4]  Michael F. Goodchild,et al.  A parallel computing approach to fast geostatistical areal interpolation , 2011, Int. J. Geogr. Inf. Sci..

[5]  A-Xing Zhu,et al.  Enabling point pattern analysis on spatial big data using cloud computing: optimizing and accelerating Ripley’s K function , 2016, Int. J. Geogr. Inf. Sci..

[6]  Zhipeng Gui,et al.  A quad-tree-based fast and adaptive Kernel Density Estimation algorithm for heat-map generation , 2019, Int. J. Geogr. Inf. Sci..

[7]  Wenwu Tang,et al.  Massively parallel spatial point pattern analysis: Ripley’s K function accelerated using graphics processing units , 2015, Int. J. Geogr. Inf. Sci..

[9]  EldawyAhmed,et al.  Spatial partitioning techniques in SpatialHadoop , 2015, VLDB 2015.

[10]  Wenwu Tang,et al.  Accelerating the discovery of space-time patterns of infectious diseases using parallel computing. , 2016, Spatial and spatio-temporal epidemiology.

[11]  Qunying Huang,et al.  Developing Subdomain Allocation Algorithms Based on Spatial and Communicational Constraints to Accelerate Dust Storm Simulation , 2016, PloS one.

[12]  Mario A. López,et al.  STR: a simple and efficient algorithm for R-tree packing , 1997, Proceedings 13th International Conference on Data Engineering.

[13]  G. Mountrakis,et al.  Spatial Analysis of Forest Crimes in Mark Twain National Forest, Missouri , 2016 .

[14]  Wenwu Tang,et al.  Spatiotemporal Point Pattern Analysis Using Ripley's K Function , 2017 .

[15]  Huisheng Wu,et al.  ANALYZING THE SPATIOTEMPORAL DISTRIBUTION OF DIFFERENT INDUSTRIES IN WUHAN CITY USING ENTERPRISE REGISTRATION DATA , 2017 .

[16]  Jørgen Lauridsen,et al.  Spatial point pattern analysis and industry concentration , 2011 .

[17]  Zhipeng Gui,et al.  Optimizing and accelerating space-time Ripley's K function based on Apache Spark for distributed spatiotemporal point pattern analysis , 2019, Future Gener. Comput. Syst..

[18]  T. Svoray,et al.  Settlement patterns, social complexity and agricultural strategies during the Chalcolithic period in the Northern Negev, Israel , 2010 .

[19]  Jianya Gong,et al.  Big enterprise registration data imputation: Supporting spatiotemporal analysis of industries in China , 2018, Comput. Environ. Urban Syst..

[20]  Håkon Toftaker,et al.  Geometric Anisotropic Spatial Point Pattern Analysis and Cox Processes , 2014 .

[21]  Zhenlong Li,et al.  Contemporary Computing Technologies for Processing Big Spatiotemporal Data , 2015 .

[22]  Rasmus Waagepetersen,et al.  Generalizations of Ripley's K-function with Application to Space Curves , 2019, IPMI.