Speeding up construction of PMR quadtree-based spatial indexes

Abstract. Spatial indexes, such as those based on the quadtree, are important in spatial databases for efficient execution of queries involving spatial constraints, especially when the queries involve spatial joins. In this paper we present a number of techniques for speeding up the construction of quadtree-based spatial indexes, specifically the PMR quadtree, which can index arbitrary spatial data. We assume a quadtree implementation using the “linear quadtree”, a disk-resident representation that stores objects contained in the leaf nodes of the quadtree in a linear index (e.g., a B-tree) ordered based on a space-filling curve. We present two complementary techniques: an improved insertion algorithm and a bulk-loading method. The bulk-loading method can be extended to handle bulk-insertions into an existing PMR quadtree. We make some analytical observations about the I/O cost and CPU cost of our PMR quadtree bulk-loading algorithm, and conduct an extensive empirical study of the techniques presented in the paper. Our techniques are found to yield significant speedup compared to traditional quadtree building methods, even when the size of a main memory buffer is very small compared to the size of the resulting quadtrees.

[1]  Sridhar Ramaswamy,et al.  Scalable Sweeping-Based Spatial Join , 1998, VLDB.

[2]  Walid G. Aref,et al.  Extending a DBMS with Spatial Operations , 1991, SSD.

[3]  Desh Ranjan,et al.  Space Filling Curves and Their Use in the Design of Geometric Data Structures , 1995, LATIN.

[4]  Yoram J. Sussmann,et al.  Speeding up bulk-loading of quadtrees , 1997, GIS '97.

[5]  Irene Gargantini,et al.  An effective way to represent quadtrees , 1982, CACM.

[6]  T. H. Merrett,et al.  A class of data structures for associative searching , 1984, PODS.

[7]  A. Guttmma,et al.  R-trees: a dynamic index structure for spatial searching , 1984 .

[8]  Christos Faloutsos,et al.  Hilbert R-tree: An Improved R-tree using Fractals , 1994, VLDB.

[9]  Shou-Hsuan Stephen Huang,et al.  On the construction of weighted time-optimal B-trees , 1990, BIT Comput. Sci. Sect..

[10]  Hanan Samet,et al.  A population analysis for hierarchical data structures , 1987, SIGMOD '87.

[11]  Walid G. Aref,et al.  An Approach to Information Management in Geographical Applications , 1990 .

[12]  David J. DeWitt,et al.  Client-Server Paradise , 1994, VLDB.

[13]  Mario A. López,et al.  On Optimal Node Splitting for R-trees , 1998, VLDB.

[14]  Philip S. Yu,et al.  The S-Tree: An Efficient Index for Multidimensional Objects , 1997, SSD.

[15]  Ibrahim Kamel,et al.  Bulk insertion in dynamic r-trees , 1996 .

[16]  Jiong Yang,et al.  PK-tree: A Spatial Index Structure for High Dimensional Point Data , 1998, FODO.

[17]  Nick Roussopoulos,et al.  Direct spatial search on pictorial databases using packed R-trees , 1985, SIGMOD Conference.

[18]  Hanan Samet,et al.  Benchmarking Spatial Join Operations with Spatial Output , 1995, VLDB.

[19]  Mario A. López,et al.  STR: a simple and efficient algorithm for R-tree packing , 1997, Proceedings 13th International Conference on Data Engineering.

[20]  Lars Arge,et al.  Efficient External-Memory Data Structures and Applications , 1996, BRICS Dissertation Series.

[21]  Christos Faloutsos,et al.  On packing R-trees , 1993, CIKM '93.

[22]  Jon Louis Bentley,et al.  Multidimensional binary search trees used for associative searching , 1975, CACM.

[23]  Bernhard Seeger,et al.  A Generic Approach to Bulk Loading Multidimensional Index Structures , 1997, VLDB.

[24]  Christos Faloutsos,et al.  Multiattribute hashing using Gray codes , 1986, SIGMOD '86.

[25]  Hanan Samet,et al.  The Design and Analysis of Spatial Data Structures , 1989 .

[26]  S. Sudarshan,et al.  Incremental Organization for Data Recording and Warehousing , 1997, VLDB.

[27]  Alok Aggarwal,et al.  The input/output complexity of sorting and related problems , 1988, CACM.

[28]  Chuan-Heng Ang,et al.  New Linear Node Splitting Algorithm for R-trees , 1997, SSD.

[29]  Alan L. Tharp,et al.  Optimal B-tree packing , 1991, Inf. Syst..

[30]  Mario A. López,et al.  A greedy algorithm for bulk loading R-trees , 1998, GIS '98.

[31]  Michael Stonebraker,et al.  An Analysis of Rule Indexing Implementations in Data Base Systems , 1986, Expert Database Conf..

[32]  Andrew Chi-Chih Yao,et al.  On random 2–3 trees , 1978, Acta Informatica.

[33]  Betty Salzberg,et al.  File Structures: An Analytic Approach , 1988 .

[34]  Raj Jain,et al.  Algorithms and strategies for similarity retrieval , 1996 .

[35]  Hans-Peter Kriegel,et al.  The Buddy-Tree: An Efficient and Robust Access Method for Spatial Data Base Systems , 1990, VLDB.

[36]  Jiong Yang,et al.  Yet Another Spatial Indexing Structure , 1997 .

[37]  Christian Böhm,et al.  Improving the Query Performance of High-Dimensional Index Structures by Bulk-Load Operations , 1998, EDBT.

[38]  Arnold L. Rosenberg,et al.  Time- and space-optimality in B-trees , 1981, TODS.

[39]  Dariu M. Gavrila,et al.  R-Tree Index Optimization , 1994 .

[40]  Jaideep Srivastava,et al.  Algorithms for loading parallel grid files , 1993, SIGMOD Conference.

[41]  Scott T. Leutenegger,et al.  Efficient Bulk-Loading of Gridfiles , 1997, IEEE Trans. Knowl. Data Eng..

[42]  Nick Roussopoulos,et al.  Cubetree: organization of and bulk incremental updates on the data cube , 1997, SIGMOD '97.

[43]  H. V. Jagadzsh Linear Clustering of Objects with Multiple Attributes , 1998 .

[44]  Hans-Peter Kriegel,et al.  The Impact of Global Clustering on Spatial Database Systems , 1994, VLDB.

[45]  David M. Mark,et al.  A Comparative Analysis of some 2-Dimensional Orderings , 1990, Int. J. Geogr. Inf. Sci..

[46]  Elke A. Rundensteiner,et al.  Bulk-insertions into R-trees using the small-tree-large-tree approach , 1998, GIS '98.

[47]  Pavel Zezula,et al.  M-tree: An Efficient Access Method for Similarity Search in Metric Spaces , 1997, VLDB.

[48]  Hanan Samet,et al.  The Spatial Spreadsheet , 1999, VISUAL.

[49]  Marco Patella,et al.  Bulk Loading the M-tree , 2001 .

[50]  Hanan Samet,et al.  Applications of spatial data structures , 1989 .

[51]  Michael Freeston,et al.  The BANG file: A new kind of grid file , 1987, SIGMOD '87.

[52]  Hans-Peter Kriegel,et al.  The R*-tree: an efficient and robust access method for points and rectangles , 1990, SIGMOD '90.

[53]  Hanan Samet,et al.  Orthogonal Polygons as Bounding Structures in Filter-Refine Query Processing Strategies , 1997, SSD.

[54]  Peter Widmayer,et al.  Enclosing Many Boxes by an Optimal Pair of Boxes , 1992, STACS.

[55]  Klaus H. Hinrichs,et al.  Efficient Bulk Operations on Dynamic R-Trees , 1999, Algorithmica.