A MapReduce-based efficient H-bucket PMR quadtree spatial index

Majority of the MapReduce-Hadoop based indexes are based on either non-disjoint decomposition or the data-dependent disjoint decomposition of space. Quadtree index based regular disjoint decomposition in MapReduce takes different forms of spatial data as point data. Lines, curves, polygons and other higher dimensional data are transformed to point data through a mapping process. Though, the mapping makes index-building quite easy, but it is not suitable for answering search queries. This paper proposes H-bucket PMR Quadtree, a parallel implementation of the existing bucket-PMR Quadtree to handle curvilinear or polygonal map data, in MapReduce. The proposed index uses a two-level of indexing: a global index that indexes the decomposed dataset among cluster nodes to support parallel index building and a local bucket-PMR Quadtree index maintained by each participating cluster node. The proposed index is compared with the state-of-the-art MapReduce based R+-tree indexing and the default key-value storage (non-indexed) Hadoop towards index build-time and spatial queries, such as line search and range search queries. The experimental results demonstrate the effectiveness of the proposed index in MapReduce environment.

[1]  Antonin Guttman,et al.  R-trees: a dynamic index structure for spatial searching , 1984, SIGMOD '84.

[2]  Scott T. Leutenegger,et al.  Master-client R-trees: a new parallel R-tree architecture , 1999, Proceedings. Eleventh International Conference on Scientific and Statistical Database Management.

[3]  Guihai Chen,et al.  Towards Parallel Spatial Query Processing for Big Spatial Data , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum.

[4]  Hanan Samet,et al.  Benchmarking Spatial Join Operations with Spatial Output , 1995, VLDB.

[5]  Hanan Samet,et al.  A consistent hierarchical representation for vector data , 1986, SIGGRAPH.

[6]  Byung-Uk Choi,et al.  Effective indexing and searching with dimensionality reduction on high-dimensional space , 2016, Comput. Syst. Sci. Eng..

[7]  Jun Feng,et al.  HQ-Tree: A distributed spatial index based on Hadoop , 2014 .

[8]  Yong Zhang,et al.  An optimization model of Hadoop cluster performance prediction based on Markov process , 2016, Comput. Syst. Sci. Eng..

[9]  Craig MacDonald,et al.  MapReduce indexing strategies: Studying scalability and efficiency , 2012, Inf. Process. Manag..

[10]  Jürg Nievergelt,et al.  The Grid File: An Adaptable, Symmetric Multikey File Structure , 1984, TODS.

[11]  Ahmed Eldawy,et al.  A Demonstration of SpatialHadoop: An Efficient MapReduce Framework for Spatial Data , 2013, Proc. VLDB Endow..

[12]  Hanan Samet,et al.  Data-parallel primitives for spatial operations using PM quadtrees , 1995, Proceedings of Conference on Computer Architectures for Machine Perception.

[13]  Hanan Samet,et al.  The Design and Analysis of Spatial Data Structures , 1989 .

[14]  Beng Chin Ooi,et al.  Exploiting Spatial Indexes for Semijoin-Based Join Processing in Distributed Spatial Databases , 2000, IEEE Trans. Knowl. Data Eng..

[15]  Hanan Samet,et al.  Performance of Data-Parallel Spatial Operations , 1994, VLDB.

[16]  Li Xun,et al.  Parallel Spatial Index Algorithm Based on Hilbert Partition , 2013, 2013 International Conference on Computational and Information Sciences.

[17]  Hanan Samet,et al.  A qualitative comparison study of data structures for large line segment databases , 1992, SIGMOD '92.

[18]  Jinyun Fang,et al.  Multi-dimensional Index on Hadoop Distributed File System , 2010, 2010 IEEE Fifth International Conference on Networking, Architecture, and Storage.

[19]  Huizhong Chen,et al.  Parallel bulk-loading of spatial data with MapReduce: An R-tree case , 2011, Wuhan University Journal of Natural Sciences.

[20]  Naphtali Rishe,et al.  Leveraging Cloud Computing in Geodatabase Management , 2010, 2010 IEEE International Conference on Granular Computing.

[21]  Hans-Peter Kriegel,et al.  The R*-tree: an efficient and robust access method for points and rectangles , 1990, SIGMOD '90.

[22]  James J. Little,et al.  Parallel Solutions to Geometric Problems in the Scan Model of Computation , 1994, J. Comput. Syst. Sci..

[23]  Hanan Samet,et al.  Data-Parallel Spatial Join Algorithms , 1994, 1994 International Conference on Parallel Processing Vol. 3.

[24]  Bernhard Seeger,et al.  Sort-based parallel loading of R-trees , 2012, BigSpatial '12.

[25]  Trevor Mudge,et al.  Hypercube supercomputers , 1989, Proc. IEEE.

[26]  Feifei Li,et al.  Efficient parallel kNN joins for large data in MapReduce , 2012, EDBT '12.

[27]  Thor Bestul Parallel paradigms and practices for spatial data , 1992 .

[28]  Hanan Samet,et al.  Data-parallel polygonization , 2003, Parallel Comput..

[29]  Christos Faloutsos,et al.  Parallel R-trees , 1992, SIGMOD '92.

[30]  Seema Bawa,et al.  Spatial data analysis with ArcGIS and MapReduce , 2016, 2016 International Conference on Computing, Communication and Automation (ICCCA).

[31]  Christos Faloutsos,et al.  The R+-Tree: A Dynamic Index for Multi-Dimensional Objects , 1987, VLDB.

[32]  Seema Bawa,et al.  A MapReduce-based scalable discovery and indexing of structured big data , 2017, Future Gener. Comput. Syst..

[33]  Yonggang Wang,et al.  Research and implementation on spatial data storage and operation based on Hadoop platform , 2010, 2010 Second IITA International Conference on Geoscience and Remote Sensing.