An Analysis Framework for Access Methods

Designing and tuning access methods (AMs) has always been more of a black art than a rigorous discipline, with performance assessments being mostly reduced to presenting bottom-line runtime or I/O numbers. This paper presents an analysis framework for AMs that defines performance metrics which are more meaningful than bottom-line numbers and thereby allow the AM designer to detect and isolate deficiencies in an AM design. The analysis process takes a workload--a tree and a set of queries--as input and provides metrics that characterize the performance of each query as well as that of the tree structure and the structure-shaping aspects of the AM implementation. Central to the framework is the use of the optimal behavior--which can be approximated relatively efficiently--as a point of reference against which the actual observed performance is measured. The performance metrics themselves reflect the fundamental performance-relevant properties of the input tree. The framework applies to most balanced tree-structured AMs and is not restricted to particular types of of data or queries. It is implemented in "amdb," a comprehensive graphical design tool for AMs that are constructed on top of the Generalized Search Tree abstraction. Amdb complements the analysis framework with visualization and debugging functionality, allowing the AM designer to investigate the source of those deficiencies that were brought to light with the help of the analysis framework.

[1]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[2]  Douglas Comer,et al.  Ubiquitous B-Tree , 1979, CSUR.

[3]  Antonin Guttman,et al.  R-trees: a dynamic index structure for spatial searching , 1984, SIGMOD '84.

[4]  Nick Roussopoulos,et al.  Faloutsos: "the r+- tree: a dynamic index for multidimensional objects , 1987 .

[5]  Christos Faloutsos,et al.  The A dynamic index for multidimensional ob-jects , 1987, Very Large Data Bases Conference.

[6]  Hans-Peter Kriegel,et al.  The R*-tree: an efficient and robust access method for points and rectangles , 1990, SIGMOD '90.

[7]  Christos Faloutsos,et al.  On packing R-trees , 1993, CIKM '93.

[8]  Bernd-Uwe Pagel,et al.  Towards an analysis of range query performance in spatial data structures , 1993, PODS '93.

[9]  Christos Faloutsos,et al.  Beyond uniformity and independence: analysis of R-trees using the concept of fractal dimension , 1994, PODS.

[10]  Christos Faloutsos,et al.  Estimating the Selectivity of Spatial Queries Using the 'Correlation' Fractal Dimension , 1995, VLDB.

[11]  Jeffrey F. Naughton,et al.  Generalized Search Trees for Database Systems , 1995, VLDB.

[12]  Bernd-Uwe Pagel,et al.  Window query-optimal clustering of spatial objects , 1995, PODS.

[13]  Bernd-Uwe Pagel,et al.  Are window queries representative for arbitrary range queries? , 1996, PODS.

[14]  Ramesh C. Jain,et al.  Similarity indexing with the SS-tree , 1996, Proceedings of the Twelfth International Conference on Data Engineering.

[15]  Timos K. Sellis,et al.  A model for the prediction of R-tree performance , 1996, PODS.

[16]  J. Myllymaki,et al.  DEVise: integrated querying and visual exploration of large datasets , 1997, SIGMOD '97.

[17]  Shin'ichi Satoh,et al.  The SR-tree: an index structure for high-dimensional nearest neighbor queries , 1997, SIGMOD '97.

[18]  Shashi Shekhar,et al.  Multilevel hypergraph partitioning: application in VLSI domain , 1997, DAC.

[19]  Mario A. López,et al.  STR: a simple and efficient algorithm for R-tree packing , 1997, Proceedings 13th International Conference on Data Engineering.

[20]  Hans-Peter Kriegel,et al.  The pyramid-technique: towards breaking the curse of dimensionality , 1998, SIGMOD '98.

[21]  Hanan Samet,et al.  Visualizing and Animating R-trees and Spatial Operations in Spatial Databases on the Worldwide Web , 1998, VDB.

[22]  Erik Riedel,et al.  A performance study of sequential I/O on windows NT TM 4 , 1998 .

[23]  Oliver Günther,et al.  Multidimensional access methods , 1998, CSUR.

[24]  Joseph M. Hellerstein,et al.  AMDB: an access method debugging tool , 1998, SIGMOD '98.

[25]  Jonathan Goldstein,et al.  When Is ''Nearest Neighbor'' Meaningful? , 1999, ICDT.

[26]  Shashi Shekhar,et al.  Multilevel hypergraph partitioning: applications in VLSI domain , 1999, IEEE Trans. Very Large Scale Integr. Syst..

[27]  Mario A. López,et al.  The Effect of Buffering on the Performance of R-Trees , 2000, IEEE Trans. Knowl. Data Eng..