Estimating Block Selectivities

Abstract In many data base performance studies there is a need to estimate the number of records of a file which qualify in a query, as well as the number of blocks of secondary storage which contain these records. In this paper we present a model of data base contents and data placement on devices. We extend a multivariate statistical model that was used for the estimation of record selectivities in [5] to model the distribution of records that qualify in a query among the blocks of secondary storage. Then we show how to obtain estimates of block selectivities and we compare our estimates with the estimates of previous models.

[1]  Stavros Christodoulakis,et al.  Estimating block transfers and join sizes , 1983, SIGMOD '83.

[2]  Philippe Richard,et al.  Evaluation of the size of a query expressed in relational algebra , 1981, SIGMOD '81.

[3]  Stavros Christodoulakis Issues in Query Evaluation. , 1982 .

[4]  Stavros Christodoulakis,et al.  Estimating record selectivities , 1983, Inf. Syst..

[5]  Mario Schkolnick,et al.  The Optimal Selection of Secondary Indices for Files , 1975, Inf. Syst..

[6]  E. F. Codd,et al.  A relational model of data for large shared data banks , 1970, CACM.

[7]  Donald F. Specht,et al.  Generation of Polynomial Discriminant Functions for Pattern Recognition , 1967, IEEE Trans. Electron. Comput..

[8]  Eugene Wong,et al.  Query processing in sdd-i: a system for distributed databases , 1979 .

[9]  Julius T. Tou,et al.  Pattern Recognition Principles , 1974 .

[10]  Alfonso F. Cardenas Analysis and performance of inverted data base structures , 1975, CACM.

[11]  Mario Schkolnick A Survey of Physical Database Design Methodology and Techniques , 1978, VLDB.

[12]  S. B. Yao,et al.  Approximating block accesses in database organizations , 1977, CACM.

[13]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[14]  James B. Rothnie,et al.  Attribute based file organization in a paged memory environment , 1974, CACM.

[15]  S. Christodoulakis A Multivariate Statistical Model for Data Base Performance Evaluation , 1982 .

[16]  Robert Demolombe,et al.  Estimation of the Number of Tuples Satisfying a Query Expressed in Predicate Calculus Language , 1980, VLDB.

[17]  Patricia G. Selinger,et al.  Access path selection in a relational database management system , 1979, SIGMOD '79.

[18]  William Palin Elderton Frequency curves and correlation , 1928 .

[19]  G. Sebestyen,et al.  An Algorithm for Non-Parametric Pattern Recognition , 1966, IEEE Trans. Electron. Comput..

[20]  Athanasios Papoulis,et al.  Probability, Random Variables and Stochastic Processes , 1965 .

[21]  Stavros Christodoulakis,et al.  Message files , 1982, TOIS.