Performance analysis of R*-trees with arbitrary node extents

Existing analysis for R-trees is inadequate for several traditional and emerging applications including, for example, temporal, spatio-temporal, and multimedia databases because it is based on the assumption that the extents of a node are identical on all dimensions, which is not satisfied in these domains. We propose analytical models that can accurately predict R*-tree performance without this assumption. Our derivation is based on the novel concept of extent regression function, which computes the node extents as a function of the number of node splits. Detailed experimental evaluation reveals that the proposed models are accurate, even in cases where previous methods fail completely.

[1]  Curtis P. Kolovson Indexing techniques for historical databases , 1989, [1989] Proceedings. Fifth International Conference on Data Engineering.

[2]  Sudipto Guha,et al.  Dynamic multidimensional histograms , 2002, SIGMOD '02.

[3]  Yufei Tao,et al.  Cost models for overlapping and multiversion structures , 2002, TODS.

[4]  Andrew Chi-Chih Yao,et al.  On random 2–3 trees , 1978, Acta Informatica.

[5]  Jeffrey Scott Vitter,et al.  Wavelet-based histograms for selectivity estimation , 1998, SIGMOD '98.

[6]  Antonin Guttman,et al.  R-trees: a dynamic index structure for spatial searching , 1984, SIGMOD '84.

[7]  Christos Faloutsos,et al.  Estimating the Selectivity of Spatial Queries Using the 'Correlation' Fractal Dimension , 1995, VLDB.

[8]  Christos Faloutsos,et al.  I/O complexity for range queries on region data stored using an R-tree , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[9]  Hans-Peter Kriegel,et al.  The X-tree : An Index Structure for High-Dimensional Data , 2001, VLDB.

[10]  Jeffrey F. Naughton,et al.  Accurate estimation of the cost of spatial selections , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[11]  Rajeev Rastogi,et al.  Independence is good: dependency-based histogram synopses for high-dimensional data , 2001, SIGMOD '01.

[12]  Christos Faloutsos,et al.  Analysis of object oriented spatial access methods , 1987, SIGMOD '87.

[13]  Christos Faloutsos,et al.  On packing R-trees , 1993, CIKM '93.

[14]  Yufei Tao,et al.  Spatial queries in dynamic environments , 2003, TODS.

[15]  Christian S. Jensen,et al.  Indexing the positions of continuously moving objects , 2000, SIGMOD '00.

[16]  Anand Sivasubramaniam,et al.  Analyzing range queries on spatial data , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[17]  Divyakant Agrawal,et al.  Selectivity Estimation for Spatial Joins with Geometric Selections , 2002, EDBT.

[18]  Bernd-Uwe Pagel,et al.  Towards an analysis of range query performance in spatial data structures , 1993, PODS '93.

[19]  Hans-Joachim Lenz,et al.  PISA: Performance models for Index Structures with and without Aggregated data , 1999, Proceedings. Eleventh International Conference on Scientific and Statistical Database Management.

[20]  Mario A. López,et al.  The Effect of Buffering on the Performance of R-Trees , 2000, IEEE Trans. Knowl. Data Eng..

[21]  Timos K. Sellis,et al.  Efficient Cost Models for Spatial Queries Using R-Trees , 2000, IEEE Trans. Knowl. Data Eng..

[22]  Timos K. Sellis,et al.  Spatio-temporal composition and indexing for large multimedia applications , 1998, Multimedia Systems.

[23]  Surajit Chaudhuri,et al.  Dynamic sample selection for approximate query processing , 2003, SIGMOD '03.

[24]  Sridhar Ramaswamy,et al.  Selectivity estimation in spatial databases , 1999, SIGMOD '99.

[25]  Jeffrey Scott Vitter,et al.  Bkd-Tree: A Dznamic Scalable kd-Tree , 2003, SSTD.

[26]  Bernhard Seeger,et al.  A comparison of selectivity estimators for range queries on metric attributes , 1999, SIGMOD '99.

[27]  Jeffrey Scott Vitter,et al.  Dynamic Maintenance of Wavelet-Based Histograms , 2000, VLDB.

[28]  Christos Faloutsos,et al.  Beyond uniformity and independence: analysis of R-trees using the concept of fractal dimension , 1994, PODS.

[29]  Christos Faloutsos,et al.  Hilbert R-tree: An Improved R-tree using Fractals , 1994, VLDB.

[30]  Dimitrios Gunopulos,et al.  Approximating multi-dimensional aggregate range queries over real attributes , 2000, SIGMOD 2000.

[31]  Panos Kalnis,et al.  Indexing spatio-temporal data warehouses , 2002, Proceedings 18th International Conference on Data Engineering.

[32]  Deok-Hwan Kim,et al.  Multi-dimensional selectivity estimation using compressed histogram information , 1999, SIGMOD '99.

[33]  Christian S. Jensen,et al.  R-Tree Based Indexing of Now-Relative Bitemporal Data , 1998, VLDB.

[34]  Bernd-Uwe Pagel,et al.  Are window queries representative for arbitrary range queries? , 1996, PODS.

[35]  Masatoshi Yoshikawa,et al.  The A-tree: An Index Structure for High-Dimensional Spaces Using Relative Approximation , 2000, VLDB.

[36]  Hans-Peter Kriegel,et al.  The R*-tree: an efficient and robust access method for points and rectangles , 1990, SIGMOD '90.

[37]  Qing Liu,et al.  Multiscale Histograms: Summarizing Topological Relations in Large Spatial Datasets , 2003, VLDB.

[38]  Dieter Pfoser,et al.  Novel Approaches to the Indexing of Moving Object Trajectories , 2000, VLDB.

[39]  M. Goodchild The national center for geographic information and analysis , 1990 .

[40]  ZhangJun,et al.  Cost models for overlapping and multiversion structures , 2002 .

[41]  Vassilis J. Tsotras,et al.  Comparison of access methods for time-evolving data , 1999, CSUR.

[42]  Michael Stonebraker,et al.  The Design of the POSTGRES Storage System , 1988, VLDB.

[43]  Luis Gravano,et al.  STHoles: a multidimensional workload-aware histogram , 2001, SIGMOD '01.

[44]  Christos Faloutsos,et al.  Accurate Modeling of Region Data , 2001, IEEE Trans. Knowl. Data Eng..

[45]  Timos K. Sellis,et al.  A model for the prediction of R-tree performance , 1996, PODS.

[46]  Yannis E. Ioannidis,et al.  Selectivity Estimation Without the Attribute Value Independence Assumption , 1997, VLDB.

[47]  Christos Faloutsos,et al.  The R+-Tree: A Dynamic Index for Multi-Dimensional Objects , 1987, VLDB.

[48]  Jeffrey Scott Vitter,et al.  Wavelet-Based Cost Estimation for Spatial Queries , 2001, SSTD.

[49]  Michael Stonebraker,et al.  Segment indexes: dynamic indexing techniques for multi-dimensional interval data , 1991, SIGMOD '91.

[50]  Christian Böhm,et al.  A cost model for query processing in high dimensional data spaces , 2000, TODS.

[51]  Divyakant Agrawal,et al.  Applying the golden rule of sampling for query estimation , 2001, SIGMOD '01.