MapReduce Algorithms for GIS Polygonal Overlay Processing

Polygon overlay is one of the complex operations in computational geometry. It is applied in many fields such as Geographic Information Systems (GIS), computer graphics and VLSI CAD. Sequential algorithms for this problem are in abundance in literature but there is a lack of distributed algorithms especially for MapReduce platform. In GIS, spatial data files tend to be large in size (in GBs) and the underlying overlay computation is highly irregular and compute intensive. The MapReduce paradigm is now standard in industry and academia for processing large-scale data. Motivated by the MapReduce programming model, we revisit the distributed polygon overlay problem and its implementation on MapReduce platform. Our algorithms are geared towards maximizing local processing and minimizing the communication overhead inherent with shuffle and sort phases in MapReduce. We have experimented with two data sets and achieved up to 22x speedup with dataset 1 using 64 CPU cores.

[1]  Thomas C. Waugh,et al.  An algorithm for polygon overlay using cooperative parallel processing , 1992, Int. J. Geogr. Inf. Sci..

[2]  Zhiyong Xu,et al.  SJMR: Parallelizing spatial join with MapReduce on clusters , 2009, 2009 IEEE International Conference on Cluster Computing and Workshops.

[3]  David Sun,et al.  UNIFORM GRIDS: A TECHNIQUE FOR INTERSECTION DETECTION ON SERIAL AND PARALLEL MACHINES , 2008 .

[4]  Sushil K. Prasad,et al.  Lessons Learnt from the Development of GIS Application on Azure Cloud Platform , 2012, 2012 IEEE Fifth International Conference on Cloud Computing.

[5]  A. Guttman,et al.  A Dynamic Index Structure for Spatial Searching , 1984, SIGMOD 1984.

[6]  Xi He,et al.  A System for GIS Polygonal Overlay Computation on Linux Cluster - An Experience and Performance Report , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum.

[7]  Fangju Wang A parallel intersection algorithm for vector polygon overlay , 1993, IEEE Computer Graphics and Applications.

[8]  Hongwei Zhou,et al.  A cloud-based system for spatial analysis service , 2011, 2011 International Conference on Remote Sensing, Environment and Transportation Engineering.

[9]  H. F. Langendoen Parallelizing the Polygon Overlay Problem Using Orca , 1995 .

[10]  Dinesh Agarwal Crayons: An Azure Cloud Based Parallel System for GIS Overlay Operations , 2012, 2012 SC Companion: High Performance Computing, Networking Storage and Analysis.

[11]  Marios Hadjieleftheriou,et al.  R-Trees - A Dynamic Index Structure for Spatial Searching , 2008, ACM SIGSPATIAL International Workshop on Advances in Geographic Information Systems.

[12]  Naphtali Rishe,et al.  Experiences on Processing Spatial Data with MapReduce , 2009, SSDBM.

[13]  Ershun Zhong,et al.  A parallel line segment intersection strategy based on uniform grids , 2009 .

[14]  Richard Healey,et al.  Parallel Processing Algorithms for GIS , 1997 .

[15]  Jimmy J. Lin,et al.  Design patterns for efficient graph algorithms in MapReduce , 2010, MLG '10.

[16]  Huizhong Chen,et al.  Parallel bulk-loading of spatial data with MapReduce: An R-tree case , 2011, Wuhan University Journal of Natural Sciences.