Parallel bulk-loading of spatial data with MapReduce: An R-tree case

Current literature on parallel bulk-loading of R-tree index has the disadvantage that the quality of produced spatial index decrease considerably as the parallelism increases. To solve this problem, a novel method of bulk-loading spatial data using the popular MapReduce framework is proposed. MapReduce combines Hilbert curve and random sampling method to parallel partition and sort spatial data, thus it balances the number of spatial data in each partition. Then the bottom-up method is introduced to simplify and accelerate the sub-index construction in each partition. Three area metrics are used to test the quality of generated index under different partitions. The extensive experiments show that the generated R-trees have the similar quality with the generated R-tree using sequential bulk-loading method, while the execution time is reduced considerably by exploiting parallelism.

[1]  Antonin Guttman,et al.  R-trees: a dynamic index structure for spatial searching , 1984, SIGMOD '84.

[2]  Scott T. Leutenegger,et al.  Efficient Bulk-Loading of Gridfiles , 1997, IEEE Trans. Knowl. Data Eng..

[3]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[4]  Klaus H. Hinrichs,et al.  Efficient Bulk Operations on Dynamic R-Trees , 1999, Algorithmica.

[5]  Bernhard Seeger,et al.  An Evaluation of Generic Bulk Loading Techniques , 2001, VLDB.

[6]  Hans-Peter Kriegel,et al.  The R*-tree: an efficient and robust access method for points and rectangles , 1990, SIGMOD '90.

[7]  Mario A. López,et al.  STR: a simple and efficient algorithm for R-tree packing , 1997, Proceedings 13th International Conference on Data Engineering.

[8]  Bernhard Seeger,et al.  A Generic Approach to Bulk Loading Multidimensional Index Structures , 1997, VLDB.

[9]  Nick Roussopoulos,et al.  Direct spatial search on pictorial databases using packed R-trees , 1985, SIGMOD Conference.

[10]  Peter J. H. King,et al.  Using Space-Filling Curves for Multi-dimensional Indexing , 2000, BNCOD.

[11]  Christian Böhm,et al.  Improving the Query Performance of High-Dimensional Index Structures by Bulk-Load Operations , 1998, EDBT.

[12]  Christos Faloutsos,et al.  On packing R-trees , 1993, CIKM '93.

[13]  A. Guttman,et al.  A Dynamic Index Structure for Spatial Searching , 1984, SIGMOD 1984.

[14]  Yannis Manolopoulos,et al.  Parallel bulk-loading of spatial data , 2003, Parallel Comput..

[15]  Marco Patella,et al.  Bulk Loading the M-tree , 2001 .

[16]  Naphtali Rishe,et al.  Experiences on Processing Spatial Data with MapReduce , 2009, SSDBM.