A Stream Tilling Approach to Surface Area Estimation for Large Scale Spatial Data in a Shared Memory System

Abstract Surface area estimation is a widely used tool for resource evaluation in the physical world. When processing large scale spatial data, the input/output (I/O) can easily become the bottleneck in parallelizing the algorithm due to the limited physical memory resources and the very slow disk transfer rate. In this paper, we proposed a stream tilling approach to surface area estimation that first decomposed a spatial data set into tiles with topological expansions. With these tiles, the one-to-one mapping relationship between the input and the computing process was broken. Then, we realized a streaming framework towards the scheduling of the I/O processes and computing units. Herein, each computing unit encapsulated a same copy of the estimation algorithm, and multiple asynchronous computing units could work individually in parallel. Finally, the performed experiment demonstrated that our stream tilling estimation can efficiently alleviate the heavy pressures from the I/O-bound work, and the measured speedup after being optimized have greatly outperformed the directly parallel versions in shared memory systems with multi-core processors.

[1]  Jun Wan,et al.  Bias estimation and correction for triangle-based surface area calculations , 2016, Int. J. Geogr. Inf. Sci..

[2]  Shaowen Wang A CyberGIS Framework for the Synthesis of Cyberinfrastructure, GIS, and Spatial Analysis , 2010 .

[3]  Jianya Gong,et al.  ParaStream: A parallel streaming Delaunay triangulation algorithm for LiDAR points on multicore architectures , 2011, Comput. Geosci..

[4]  Christos Faloutsos,et al.  The R+-Tree: A Dynamic Index for Multi-Dimensional Objects , 1987, VLDB.

[5]  Xiaofang Zhou,et al.  Data Partitioning for Parallel Spatial Join Processing , 1997, GeoInformatica.

[6]  G.E. Moore,et al.  Cramming More Components Onto Integrated Circuits , 1998, Proceedings of the IEEE.

[7]  Joel H. Saltz,et al.  Hadoop-GIS: A High Performance Spatial Data Warehousing System over MapReduce , 2013, Proc. VLDB Endow..

[8]  Shaowen Wang,et al.  CyberGIS software: a synthetic review and integration roadmap , 2013, Int. J. Geogr. Inf. Sci..

[9]  Robert Stephens,et al.  A survey of stream processing , 1997, Acta Informatica.

[10]  Shaowen Wang,et al.  A theoretical approach to the use of cyberinfrastructure in geographical analysis , 2009, Int. J. Geogr. Inf. Sci..

[11]  Xiangguo Lin,et al.  Streaming Progressive TIN Densification Filter for Airborne LiDAR Point Clouds Using Multi-Core Architectures , 2014, Remote. Sens..

[12]  Hanan Samet,et al.  Spatial Data Structures , 1995, Modern Database Systems.

[13]  Ulrich Neumann,et al.  A streaming framework for seamless building reconstruction from large-scale aerial LiDAR data , 2009, CVPR.

[14]  J. Jenness Calculating landscape surface area from digital elevation models , 2004 .

[15]  Ming-Ling Lo,et al.  Spatial hash-joins , 1996, SIGMOD '96.

[16]  Gary C. White,et al.  Estimating population size with correlated sampling unit estimates , 2003 .

[17]  J. Chorowicz,et al.  Description of terrain as a fractal surface, and application to digital elevation model quality assessment , 1991 .

[18]  Jayanth Gummaraju,et al.  Stream Processing in General-Purpose Processors , 2004 .

[19]  Wm. Randolph Franklin Part 4: Mathematical, Algorithmic and Data Structure Issues: Adaptive Grids For Geometric Operations , 1984 .

[20]  Mark D. Hill,et al.  Amdahl's Law in the Multicore Era , 2008 .

[21]  Liang Zhai,et al.  Understanding geographical conditions monitoring: a perspective from China , 2015, Int. J. Digit. Earth.

[22]  Leonardo Neumeyer,et al.  S4: Distributed Stream Computing Platform , 2010, 2010 IEEE International Conference on Data Mining Workshops.

[23]  Yuan Shi Reevaluating Amdahl's Law and Gustafson's Law , 1996 .

[24]  R. D. Hobson Surface roughness in topography: quantitative approach , 2019, Spatial Analysis in Geomorphology.

[25]  Huayi Wu,et al.  Leveraging the power of multi-core platforms for large-scale geospatial data processing: Exemplified by generating DEM from massive LiDAR point clouds , 2010, Comput. Geosci..

[26]  Marios Hadjieleftheriou,et al.  R-Trees - A Dynamic Index Structure for Spatial Searching , 2008, ACM SIGSPATIAL International Workshop on Advances in Geographic Information Systems.

[27]  Peter Freeman Automating software design , 1974, Computer.

[28]  C. L. Philip Chen,et al.  Data-intensive applications, challenges, techniques and technologies: A survey on Big Data , 2014, Inf. Sci..

[29]  Lin Li,et al.  A Parallel Framework for Processing Massive Spatial Data with a Split-and-Merge Paradigm , 2012, Trans. GIS.

[30]  Alexandros Labrinidis,et al.  Challenges and Opportunities with Big Data , 2012, Proc. VLDB Endow..

[31]  Benoit B. Mandelbrot,et al.  Fractal Geometry of Nature , 1984 .

[32]  J. Shewchuk,et al.  Streaming computation of Delaunay triangulations , 2006, ACM Trans. Graph..

[33]  W. Tobler A Computer Movie Simulating Urban Growth in the Detroit Region , 1970 .

[34]  Jeffrey Scott Vitter,et al.  External memory algorithms and data structures: dealing with massive data , 2001, CSUR.

[35]  Qunying Huang,et al.  Using spatial principles to optimize distributed computing for enabling the physical science discoveries , 2011, Proceedings of the National Academy of Sciences.

[36]  Hanan Samet,et al.  Storing a collection of polygons using quadtrees , 1985, TOGS.

[37]  J. Feigenbaum,et al.  An approximate L/sup 1/-difference algorithm for massive data streams , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).

[38]  Robert G. Haight,et al.  The fractal forest: fractal geometry and applications in forest science. , 1994 .

[39]  Douglas Comer,et al.  Ubiquitous B-Tree , 1979, CSUR.

[40]  Martin Isenburg,et al.  Streaming computation of Delaunay triangulations , 2006, ACM Trans. Graph..

[41]  E. Wiggers,et al.  A Technique for Assessing Land Surface Ruggedness , 1983 .