Sort-based query-adaptive loading of R-trees

Bulk-loading of R-trees has been an important problem in academia and industry for more than twenty years. Current algorithms create R-trees without any information about the expected query profile. However, query profiles are extremely useful for the design of efficient indexes. In this paper, we address this deficiency and present query-adaptive algorithms for building R-trees optimally designed for a given query profile. Since optimal R-tree loading is NP-hard (even without tuning the structure to a query profile), we provide efficient, easy to implement heuristics. Our sort-based algorithms for query-adaptive loading consist of two steps: First, sorting orders are identified resulting in better R-trees than those obtained from standard space-filling curves. Second, for a given sorting order, we propose a dynamic programming algorithm for generating R-trees in linear runtime. Our experimental results confirm that our algorithms generally create significantly better R-trees than the ones obtained from standard sort-based loading algorithms, even when the query profile is unknown.

[1]  David J. DeWitt,et al.  Client-Server Paradise , 1994, VLDB.

[2]  David Lichtenstein,et al.  Planar Formulae and Their Uses , 1982, SIAM J. Comput..

[3]  Bernhard Seeger,et al.  A Generic Approach to Bulk Loading Multidimensional Index Structures , 1997, VLDB.

[4]  Yannis E. Ioannidis,et al.  Selectivity Estimation Without the Attribute Value Independence Assumption , 1997, VLDB.

[5]  Robert E. Tarjan,et al.  Self-adjusting binary search trees , 1985, JACM.

[6]  Jeffrey Scott Vitter,et al.  Optimal dynamic interval management in external memory , 1996, Proceedings of 37th Conference on Foundations of Computer Science.

[7]  Desh Ranjan,et al.  Space-Filling Curves and Their Use in the Design of Geometric Data Structures , 1997, Theor. Comput. Sci..

[8]  Martin L. Kersten,et al.  Database Cracking , 2007, CIDR.

[9]  Hans-Peter Kriegel,et al.  The R*-tree: an efficient and robust access method for points and rectangles , 1990, SIGMOD '90.

[10]  Feifei Li,et al.  The World in a Nutshell: Concise Range Queries , 2011, IEEE Transactions on Knowledge and Data Engineering.

[11]  Hans-Werner Six,et al.  Spatial priority search: an access technique for scaleless maps , 1991, SIGMOD '91.

[12]  Mark de Berg,et al.  The Priority R-tree: a practically efficient and worst-case optimal R-tree , 2004, SIGMOD '04.

[13]  Bernd-Uwe Pagel,et al.  Towards an analysis of range query performance in spatial data structures , 1993, PODS '93.

[14]  Mario A. López,et al.  STR: a simple and efficient algorithm for R-tree packing , 1997, Proceedings 13th International Conference on Data Engineering.

[15]  Timos K. Sellis,et al.  A model for the prediction of R-tree performance , 1996, PODS.

[16]  T. H. Merrett,et al.  A class of data structures for associative searching , 1984, PODS.

[17]  Bernhard Seeger,et al.  A revised r*-tree in comparison with related index structures , 2009, SIGMOD Conference.

[18]  Nick Roussopoulos,et al.  Direct spatial search on pictorial databases using packed R-trees , 1985, SIGMOD Conference.

[19]  Bernhard Seeger,et al.  An Evaluation of Generic Bulk Loading Techniques , 2001, VLDB.

[20]  Yufei Tao,et al.  Adaptive Index Structures , 2002, VLDB.

[21]  Torsten Suel,et al.  Optimal Histograms with Quality Guarantees , 1998, VLDB.

[22]  Peter Widmayer,et al.  Enclosing Many Boxes by an Optimal Pair of Boxes , 1992, STACS.

[23]  Bernd-Uwe Pagel,et al.  Window query-optimal clustering of spatial objects , 1995, PODS.

[24]  Mario A. López,et al.  A greedy algorithm for bulk loading R-trees , 1998, GIS '98.

[25]  Christos Faloutsos,et al.  On packing R-trees , 1993, CIKM '93.

[26]  Jan Vahrenhold,et al.  Query Responsive Index Structures , 2008, GIScience.

[27]  Klaus H. Hinrichs,et al.  Ecient Bulk Operations on Dynamic R-trees (Extended Abstract) , 1999 .

[28]  Sridhar Ramaswamy,et al.  Selectivity estimation in spatial databases , 1999, SIGMOD '99.

[29]  Klaus H. Hinrichs,et al.  Efficient Bulk Operations on Dynamic R-Trees , 1999, Algorithmica.