A k-d tree-based algorithm to parallelize Kriging interpolation of big spatial data

Parallel computing provides a promising solution to accelerate complicated spatial data processing, which has recently become increasingly computationally intense. Partitioning a big dataset into workload-balanced child data groups remains a challenge, particularly for unevenly distributed spatial data. This study proposed an algorithm based on the k-d tree method to tackle this challenge. The algorithm constructed trees based on the distribution variance of spatial data. The number of final sub-trees, unlike the conventional k-d tree method, is not always a power of two. Furthermore, the number of nodes on the left and right sub-trees is always no more than one to ensure a balanced workload. Experiments show that our algorithm is able to partition big datasets efficiently and evenly into equally sized child data groups. Speed-up ratios show that parallel interpolation can save up to 70% of the execution time of the consequential interpolation. A high efficiency of parallel computing was achieved when the datasets were divided into an optimal number of child data groups.

[1]  John C. Hart,et al.  Parallel SAH k-D tree construction , 2010, HPG '10.

[2]  P. Burrough,et al.  Principles of geographical information systems , 1998 .

[3]  Stan Openshaw,et al.  High-Performance Computing and Geography: Developments, Issues, and Case Studies , 1998 .

[4]  Zhang Yeting A Spatial Data Partitioning Algorithm Based on Spatial Hierarchical Decomposition Method of Hilbert Space-Filling Curve , 2007 .

[5]  Andrea Clematis,et al.  High performance computing with geographical data , 2003, Parallel Comput..

[6]  Ana Cortés,et al.  Parallel ordinary kriging interpolation incorporating automatic variogram fitting , 2011, Comput. Geosci..

[7]  Hans-Peter Kriegel,et al.  Parallel processing of spatial joins using R-trees , 1996, Proceedings of the Twelfth International Conference on Data Engineering.

[8]  Hans-Werner Six,et al.  The LSD tree: Spatial Access to Multidimensional Point and Nonpoint Objects , 1989, VLDB.

[9]  Christos Faloutsos,et al.  Hilbert R-tree: An Improved R-tree using Fractals , 1994, VLDB.

[10]  Kenneth A. Hawick,et al.  Distributed frameworks and parallel algorithms for processing large-scale geographic data , 2003, Parallel Comput..

[11]  Peter Widmayer,et al.  The LSD tree: spatial access to multidimensional and non-point objects , 1989, VLDB 1989.

[12]  Xuan Shi,et al.  Kriging interpolation over heterogeneous computer architectures and systems , 2013 .

[13]  I. Jntroductjon Neighbor Finding Techniques for Images Represented by Quadtrees * , 1980 .

[14]  R. Webster,et al.  Kriging: a method of interpolation for geographical information systems , 1990, Int. J. Geogr. Inf. Sci..

[15]  Tangpei Cheng,et al.  Accelerating universal Kriging interpolation algorithm using CUDA-enabled GPU , 2013, Comput. Geosci..

[16]  Jon Louis Bentley,et al.  Multidimensional binary search trees used for associative searching , 1975, CACM.

[17]  Srinivas Aluru,et al.  Parallel construction of multidimensional binary search trees , 1996, ICS '96.

[18]  Pejman Tahmasebi,et al.  Accelerating geostatistical simulations using graphics processing units (GPU) , 2012, Comput. Geosci..

[19]  P. Pizor Principles of Geographical Information Systems for Land Resources Assessment. , 1987 .

[20]  Miaoqing Huang,et al.  Unsupervised image classification over supercomputers Kraken, Keeneland and Beacon , 2014 .

[21]  I. Wald,et al.  On building fast kd-Trees for Ray Tracing, and on doing that in O(N log N) , 2006, 2006 IEEE Symposium on Interactive Ray Tracing.

[22]  Keith C. Clarke,et al.  A general-purpose parallel raster processing programming library test application using a geographic cellular automata model , 2010, Int. J. Geogr. Inf. Sci..

[23]  Shaowen Wang,et al.  A quadtree approach to domain decomposition for spatial interpolation in Grid computing environments , 2003, Parallel Comput..

[24]  M. J. R. Healy,et al.  The use of R2 as a measure of goodness of fit , 1984 .

[25]  Fenzhen Su,et al.  Novel parallel algorithm for constructing Delaunay triangulation based on a twofold-divide-and-conquer scheme , 2014 .

[26]  Ciprian Dobre,et al.  Parallel Programming Paradigms and Frameworks in Big Data Era , 2013, International Journal of Parallel Programming.

[27]  T. Chai,et al.  Root mean square error (RMSE) or mean absolute error (MAE)? – Arguments against avoiding RMSE in the literature , 2014 .

[28]  Hannan Samet,et al.  Region representation: Quadtrees from binary arrays , 1980 .

[29]  Anthony M. Filippi,et al.  Hyperspectral Aquatic Radiative Transfer Modeling Using a High-Performance Cluster Computing-Based Approach , 2012 .

[30]  Ling Yin,et al.  A framework of integrating GIS and parallel computing for spatial control problems – a case study of wildfire control , 2012, Int. J. Geogr. Inf. Sci..

[31]  Marc P. Armstrong,et al.  Geography and Computational Science , 2000 .

[32]  Michael F. Goodchild,et al.  A parallel computing approach to fast geostatistical areal interpolation , 2011, Int. J. Geogr. Inf. Sci..