How to avoid building DataBlades(R) that know the value of everything and the cost of nothing

The object-relational database management system (ORDBMS) offers many potential benefits for scientific, multimedia and financial applications. However work remains in the integration of domain-specific class libraries into ORDBMS query processing. A major problem is that the standard mechanisms for query selectivity estimation, taken from relational database systems, rely on properties specific to the standard data types; creation of new mechanisms remains extremely difficult because the software interfaces provided by vendors are relatively low-level. We discuss extensions of the generalized search tree, or GiST, to support a higher level but less type-specific approach. Specifically, we discuss the computation of selectivity estimates with confidence intervals using a variety of index based approaches and present results from an experimental comparison of these methods with several estimators from the literature.

[1]  Paul M. Aoki Algorithms for index-assisted selectivity estimation , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[2]  Christos H. Papadimitriou,et al.  On the analysis of indexing schemes , 1997, PODS '97.

[3]  Christos Faloutsos,et al.  Estimating the Selectivity of Spatial Queries Using the 'Correlation' Fractal Dimension , 1995, VLDB.

[4]  A. Winsor Sampling techniques. , 2000, Nursing times.

[5]  Timos K. Sellis,et al.  A model for the prediction of R-tree performance , 1996, PODS.

[6]  Nick Roussopoulos,et al.  Adaptive selectivity estimation using query feedback , 1994, SIGMOD '94.

[7]  D. Donovan Geological Survey , 1984, Nature.

[8]  Gennady Antoshenkov,et al.  Random Sampling from Pseudo-Ranked B+ Trees , 1992, VLDB.

[9]  Shuai Weng,et al.  AtDB, the Arabidopsis thaliana database, and graphical-web-display of progress by the Arabidopsis Genome Initiative , 1998, Nucleic Acids Res..

[10]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[11]  Serge J. Belongie,et al.  Region-based image querying , 1997, 1997 Proceedings IEEE Workshop on Content-Based Access of Image and Video Libraries.

[12]  Michael Stonebraker,et al.  ESMDIS: Earth System Model Data Information System , 1997, Proceedings. Ninth International Conference on Scientific and Statistical Database Management (Cat. No.97TB100150).

[13]  David J. DeWitt,et al.  Equi-Depth Histograms For Estimating Selectivity Factors For Multi-Dimensional Queries , 1988, SIGMOD Conference.

[14]  Wen-Chi Hou,et al.  Processing aggregate relational queries with hard time constraints , 1989, SIGMOD '89.

[15]  Frank Olken,et al.  Random Sampling from Databases , 1993 .

[16]  Pang C. Chen Heuristic Sampling: A Method for Predicting the Performance of Tree Searching Programs , 1992, SIAM J. Comput..

[17]  John Kirkwood Sybase SQL Server II: An Administrator's Guide , 1996 .

[18]  Hans Hinterberger,et al.  Multidimensional Data Visualization Design Tradeoffs: Speed vs. Detail , 1986, SSDBM.

[19]  David J. DeWitt,et al.  Shoring up persistent applications , 1994, SIGMOD '94.

[20]  Bernd-Uwe Pagel,et al.  Towards an analysis of range query performance in spatial data structures , 1993, PODS '93.

[21]  H. V. Jagadish,et al.  Linear clustering of objects with multiple attributes , 1990, SIGMOD '90.

[22]  M. Seetha Lakshmi,et al.  Selectivity Estimation in Extensible Databases - A Neural Network Approach , 1998, VLDB.

[23]  Doron Rotem,et al.  Simple Random Sampling from Relational Databases , 1986, VLDB.

[24]  R. Payne Geographic names information system , 1983 .

[25]  Helen J. Wang,et al.  Online aggregation , 1997, SIGMOD '97.

[26]  B. Efron,et al.  Bootstrap confidence intervals , 1996 .

[27]  Kevin D. Seppi,et al.  A Bayesian Approach to Database Query Optimization , 1993, INFORMS J. Comput..

[28]  Michael Stonebraker,et al.  The SEQUOIA 2000 storage benchmark , 1993, SIGMOD '93.

[29]  Mario A. López,et al.  On Optimal Node Splitting for R-trees , 1998, VLDB.

[30]  Peter J. Haas,et al.  Large-sample and deterministic confidence intervals for online aggregation , 1997, Proceedings. Ninth International Conference on Scientific and Statistical Database Management (Cat. No.97TB100150).

[31]  Viswanath Poosala Histogram-Based Estimation Techniques in Database Systems , 1997 .

[32]  G. Antoshenkov,et al.  Dynamic query optimization in Rdb/VMS , 1993, Proceedings of IEEE 9th International Conference on Data Engineering.

[33]  David Thomas,et al.  The Art in Computer Programming , 2001 .

[34]  James Lee Hafner,et al.  Efficient Color Histogram Indexing for Quadratic Form Distance Functions , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[35]  Jeffrey Scott Vitter,et al.  Wavelet-based histograms for selectivity estimation , 1998, SIGMOD '98.

[36]  Vishu Krishnamurthy,et al.  All Your Data: The Oracle Extensibility Architecture , 2001, Compontent Database Systems.

[37]  Farshad Fotouhi,et al.  Dynamic Selectivity Estimation for Multidimensional Queries , 1993, FODO.

[38]  David W. Scott,et al.  Multivariate Density Estimation: Theory, Practice, and Visualization , 1992, Wiley Series in Probability and Statistics.

[39]  D. Knuth Estimating the efficiency of backtrack programs. , 1974 .

[40]  Sridhar Ramaswamy,et al.  Selectivity estimation in spatial databases , 1999, SIGMOD '99.

[41]  Christos Faloutsos,et al.  Beyond uniformity and independence: analysis of R-trees using the concept of fractal dimension , 1994, PODS.

[42]  Mario A. López,et al.  STR: a simple and efficient algorithm for R-tree packing , 1997, Proceedings 13th International Conference on Data Engineering.

[43]  Michael V. Mannino,et al.  Statistical profile estimation in database systems , 1988, CSUR.

[44]  W. E. Farrell,et al.  A Hydrographic Database built on Montage and S-PLUS , 1994 .

[45]  Kathrin Anne Meier,et al.  Data abstraction through density estimation by storage management , 1997, Proceedings. Ninth International Conference on Scientific and Statistical Database Management (Cat. No.97TB100150).

[46]  Christos Faloutsos,et al.  On packing R-trees , 1993, CIKM '93.

[47]  Jaideep Srivastava,et al.  A tree based access method (TBSAM) for fast processing of aggregate queries , 1988, Proceedings. Fourth International Conference on Data Engineering.

[48]  Paul M. Aoki Generalizing "search" in generalized search trees , 1998, Proceedings 14th International Conference on Data Engineering.

[49]  Antonin Guttman,et al.  R-trees: a dynamic index structure for spatial searching , 1984, SIGMOD '84.

[50]  Per Galle Branch & sample: A simple strategy for constraint satisfaction , 1989, BIT Comput. Sci. Sect..

[51]  P. Rosenbaum Sampling the Leaves of a Tree with Equal Probabilities , 1993 .

[52]  Doron Rotem,et al.  Random Sampling from B+ Trees , 1989, VLDB.

[53]  Douglas Comer,et al.  Ubiquitous B-Tree , 1979, CSUR.

[54]  Sakti P. Ghosh SIAM: statistics information access method , 1988, Inf. Syst..

[55]  Peter J. Haas,et al.  The New Jersey Data Reduction Report , 1997 .

[56]  Torsten Suel,et al.  Optimal Histograms with Quality Guarantees , 1998, VLDB.

[57]  Ray Jain,et al.  The art of computer systems performance analysis - techniques for experimental design, measurement, simulation, and modeling , 1991, Wiley professional computing.

[58]  Jeffrey F. Naughton,et al.  Generalized Search Trees for Database Systems , 1995, VLDB.