GBI: A Generalized R-Tree Bulk-Insertion Strategy

A lot of recent work has studied strategies related to bulk loading of large data sets into multidimensional index structures. In this paper, we address the problem of bulk insertions into existing index structures with particular focus on R-trees - which are an important class of index structures used widely in commercial database systems. We propose a new technique, which as opposed to the current technique of inserting data one by one, bulk inserts entire new incoming datasets into an active R-tree. This technique, called GBI (for Generalized Bulk Insertion), partitions the new datasets into sets of clusters and outliers, constructs an R-tree (small tree) from each cluster, identifies and prepares suitable locations in the original R-tree (large tree) for insertion, and lastly performs the insertions of the small trees and the outliers into the large tree in bulk. Our experimental studies demonstrate that GBI does especially well (over 200% better than the existing technique) for randomly located data as well as for real datasets that contain few natural clusters, while also consistently outperforming the alternate technique in all other circumstances.

[1]  J. H. Ward Hierarchical Grouping to Optimize an Objective Function , 1963 .

[2]  Timos K. Sellis,et al.  Optimization Issues in R-tree Construction (Extended Abstract) , 1994, IGIS.

[3]  Elke A. Rundensteiner,et al.  Spatial Joins Using R-trees: Breadth-First Traversal with Global Optimizations , 1997, VLDB.

[4]  Jaideep Srivastava,et al.  Algorithms for loading parallel grid files , 1993, SIGMOD Conference.

[5]  Scott T. Leutenegger,et al.  Efficient Bulk-Loading of Gridfiles , 1997, IEEE Trans. Knowl. Data Eng..

[6]  Scott T. Leutenegger,et al.  Ecient Bulk-loading of Gridles , 1997 .

[7]  Nick Roussopoulos,et al.  Cubetree: organization of and bulk incremental updates on the data cube , 1997, SIGMOD '97.

[8]  Elke A. Rundensteiner,et al.  Bulk-insertions into R-trees using the small-tree-large-tree approach , 1998, GIS '98.

[9]  Weidong Chen Programming with Logical Queries, Bulk Updates, and Hypothetical Reasoning , 1997, IEEE Trans. Knowl. Data Eng..

[10]  Bernhard Seeger,et al.  A Generic Approach to Bulk Loading Multidimensional Index Structures , 1997, VLDB.

[11]  B. S. Duran,et al.  Cluster Analysis: A Survey , 1976 .

[12]  Elke A. Rundensteiner,et al.  Symbolic Intersect Detection: A Method for Improving Spatial Intersect Joins , 1998, GeoInformatica.

[13]  Robert F. Ling,et al.  Cluster analysis algorithms for data reduction and classification of objects , 1981 .

[14]  Marco Patella,et al.  Bulk Loading the M-tree , 2001 .

[15]  Mario A. López,et al.  A greedy algorithm for bulk loading R-trees , 1998, GIS '98.

[16]  H. Charles Romesburg,et al.  Cluster analysis for researchers , 1984 .

[17]  Christos Faloutsos,et al.  On packing R-trees , 1993, CIKM '93.

[18]  Antonin Guttman,et al.  R-trees: a dynamic index structure for spatial searching , 1984, SIGMOD '84.

[19]  Hans-Peter Kriegel,et al.  The R*-tree: an efficient and robust access method for points and rectangles , 1990, SIGMOD '90.

[20]  Chuan-Heng Ang,et al.  New Linear Node Splitting Algorithm for R-trees , 1997, SSD.

[21]  Klaus H. Hinrichs,et al.  Efficient Bulk Operations on Dynamic R-Trees , 1999, Algorithmica.

[22]  Elke A. Rundensteiner,et al.  Improving Spatial Intersect Joins Using Symbolic Intersect Detection , 1997, SSD.

[23]  Elke A. Rundensteiner,et al.  Bulk-Insertions into R-Trees , 1998 .

[24]  Mario A. López,et al.  STR: a simple and efficient algorithm for R-tree packing , 1997, Proceedings 13th International Conference on Data Engineering.

[25]  Elke A. Rundensteiner,et al.  A cost model for estimating the performance of spatial joins using R-trees , 1997, Proceedings. Ninth International Conference on Scientific and Statistical Database Management (Cat. No.97TB100150).

[26]  Klaus H. Hinrichs,et al.  Efficient Bulk Operations on Dynamic R-Trees , 1999, Algorithmica.

[27]  Setsuo Ohsuga,et al.  INTERNATIONAL CONFERENCE ON VERY LARGE DATA BASES , 1977 .