Packing R-trees with Space-filling Curves

The massive amount of data and large variety of data distributions in the big data era call for access methods that are efficient in both query processing and index management, and over both practical and worst-case workloads. To address this need, we revisit two classic multidimensional access methods—the R-tree and the space-filling curve. We propose a novel R-tree packing strategy based on space-filling curves. This strategy produces R-trees with an asymptotically optimal I/O complexity for window queries in the worst case. Experiments show that our R-trees are highly efficient in querying both real and synthetic data of different distributions. The proposed strategy is also simple to parallelize, since it relies only on sorting. We propose a parallel algorithm for R-tree bulk-loading based on the proposed packing strategy and analyze its performance under the massively parallel communication model. To handle dynamic data updates, we further propose index update algorithms that process data insertions and deletions without compromising the optimal query I/O complexity. Experimental results confirm the effectiveness and efficiency of the proposed R-tree bulk-loading and updating algorithms over large data sets.

[1]  Jon Louis Bentley,et al.  Transforming static data structures to dynamic structures , 1979, 20th Annual Symposium on Foundations of Computer Science (sfcs 1979).

[2]  Mark H. Overmars,et al.  The Design of Dynamic Data Structures , 1987, Lecture Notes in Computer Science.

[3]  Hans-Peter Kriegel,et al.  The X-tree : An Index Structure for High-Dimensional Data , 2001, VLDB.

[4]  Antonin Guttman,et al.  R-trees: a dynamic index structure for spatial searching , 1984, SIGMOD '84.

[5]  Jan van Leeuwen,et al.  Worst-Case Optimal Insertion and Deletion Methods for Decomposable Searching Problems , 1981, Inf. Process. Lett..

[6]  Thomas Mølhave,et al.  Using TPIE for processing massive data sets in C++ , 2012, SIGSPACIAL.

[7]  Masao Sakauchi,et al.  A new tree type data structure with homogeneous nodes suitable for a very large spatial database , 1990, [1990] Proceedings. Sixth International Conference on Data Engineering.

[8]  Roberto Grossi,et al.  Efficient cross-trees for external memory , 1999, External Memory Algorithms.

[9]  Christos Faloutsos,et al.  Hilbert R-tree: An Improved R-tree using Fractals , 1994, VLDB.

[10]  Beng Chin Ooi,et al.  Generalized multidimensional data mapping and query processing , 2005, TODS.

[11]  Yufei Tao,et al.  Theoretically Optimal and Empirically Efficient R-trees with Strong Parallelizability , 2018, Proc. VLDB Endow..

[12]  Beng Chin Ooi,et al.  R-tree-based data migration and self-tuning strategies in shared-nothing spatial databases , 2001, GIS '01.

[13]  Christos Faloutsos,et al.  Parallel R-trees , 1992, SIGMOD '92.

[14]  Beng Chin Ooi,et al.  Making the pyramid technique robust to query types and workloads , 2004, Proceedings. 20th International Conference on Data Engineering.

[15]  Christos Faloutsos,et al.  The R+-Tree: A Dynamic Index for Multi-Dimensional Objects , 1987, VLDB.

[16]  Mario A. López,et al.  A greedy algorithm for bulk loading R-trees , 1998, GIS '98.

[17]  Christos Faloutsos,et al.  Declustering Spatial Databases on a Multi-Computer Architecture , 1996, EDBT.

[18]  Herman J. Haverkort,et al.  Four-dimensional hilbert curves for R-trees , 2009, JEAL.

[19]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[20]  Christos Faloutsos,et al.  Analysis of the Clustering Properties of the Hilbert Space-Filling Curve , 2001, IEEE Trans. Knowl. Data Eng..

[21]  Joachim Gudmundsson,et al.  Box-Trees and R-Trees with Near-Optimal Query Time , 2001, SCG '01.

[22]  H. V. Jagadish,et al.  Analysis of the Hilbert Curve for Representing Two-Dimensional Space , 1997, Inf. Process. Lett..

[23]  Kurt Mehlhorn,et al.  A new data structure for representing sorted lists , 1980, Acta Informatica.

[24]  Dan Lin,et al.  The Min-dist Location Selection Query , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[25]  Bernhard Seeger,et al.  Sort-based query-adaptive loading of R-trees , 2012, CIKM.

[26]  Bernhard Seeger,et al.  Sort-based parallel loading of R-trees , 2012, BigSpatial '12.

[27]  Ambuj K. Singh,et al.  Optimal Dynamic Range Searching in Non-replicating Index Structures , 1999, ICDT.

[28]  Le Gruenwald,et al.  Parallel spatial query processing on GPUs using R-trees , 2013, BigSpatial '13.

[29]  Scott Shenker,et al.  Spark: Cluster Computing with Working Sets , 2010, HotCloud.

[30]  Bernard Chazelle,et al.  A Functional Approach to Data Structures and Its Use in Multidimensional Searching , 1988, SIAM J. Comput..

[31]  Yannis Manolopoulos,et al.  Parallel bulk-loading of spatial data , 2003, Parallel Comput..

[32]  Jin Huang,et al.  Towards a Painless Index for Spatial Objects , 2014, TODS.

[33]  Jeffrey Scott Vitter,et al.  A Framework for Index Bulk Loading and Dynamization , 2001, ICALP.

[34]  David J. DeWitt,et al.  Client-Server Paradise , 1994, VLDB.

[35]  Panayiotis Bozanis,et al.  LR-tree: a Logarithmic Decomposable Spatial Index Method , 2003, Comput. J..

[36]  T. H. Merrett,et al.  A class of data structures for associative searching , 1984, PODS.

[37]  Yufei Tao,et al.  Minimal MapReduce algorithms , 2013, SIGMOD '13.

[38]  Nick Roussopoulos,et al.  Direct spatial search on pictorial databases using packed R-trees , 1985, SIGMOD Conference.

[39]  Scott T. Leutenegger,et al.  Master-client R-trees: a new parallel R-tree architecture , 1999, Proceedings. Eleventh International Conference on Scientific and Statistical Database Management.

[40]  Michael T. Goodrich,et al.  Communication-Efficient Parallel Sorting , 1999, SIAM J. Comput..

[41]  Kyle Fox,et al.  Parallel Algorithms for Constructing Range and Nearest-Neighbor Searching Data Structures , 2016, PODS.

[42]  Jan van Leeuwen,et al.  Dynamization of Decomposable Searching Problems Yielding Good Worsts-Case Bounds , 1981, Theoretical Computer Science.

[43]  Robert E. Tarjan,et al.  Scaling and related techniques for geometry problems , 1984, STOC '84.

[44]  Mark de Berg,et al.  The Priority R-tree: a practically efficient and worst-case optimal R-tree , 2004, SIGMOD '04.

[45]  Mario A. López,et al.  STR: a simple and efficient algorithm for R-tree packing , 1997, Proceedings 13th International Conference on Data Engineering.

[46]  Jon Louis Bentley,et al.  Multidimensional binary search trees used for associative searching , 1975, CACM.

[47]  Alexandr Andoni,et al.  Parallel algorithms for geometric graph problems , 2013, STOC.

[48]  Ralf Hartmut Güting Dr.rer.nat An introduction to spatial database systems , 2005, The VLDB Journal.

[49]  Jeffrey Scott Vitter,et al.  Optimal External Memory Interval Management , 2003, SIAM J. Comput..

[50]  Jon Louis Bentley,et al.  Decomposable Searching Problems , 1979, Inf. Process. Lett..

[51]  Srikanta Tirthapura,et al.  On the optimality of clustering properties of space filling curves , 2012, PODS '12.

[52]  H. V. Jagadish Spatial search with polyhedra , 1990, [1990] Proceedings. Sixth International Conference on Data Engineering.

[53]  Jan Vahrenhold,et al.  I/O-efficient dynamic planar point location (extended abstract) , 2000, SCG '00.

[54]  Hans-Peter Kriegel,et al.  The R*-tree: an efficient and robust access method for points and rectangles , 1990, SIGMOD '90.

[55]  Oliver Günther,et al.  Multidimensional access methods , 1998, CSUR.