Query size estimation by adaptive sampling (extended abstract)

We present an adaptive, random sampling algorithm for estimating the size of general queries. The algorithm can be used for any query <italic>Q</italic> over a database <italic>D</italic> such that 1) for some <italic>n</italic>, the answer to <italic>Q</italic> can be partitioned into <italic>n</italic> disjoint subsets <italic>Q</italic><subscrpt>1</subscrpt>, <italic>Q</italic><subscrpt>2</subscrpt>, …, <italic>Q<subscrpt>n</subscrpt></italic>, and 2) for 1 ≤ <italic>i</italic> ≤ <italic>n</italic>, the size of <italic>Q<subscrpt>i</subscrpt></italic> is bounded by some function <italic>b</italic>(<italic>D, Q</italic>), and 3) there is some algorithm by which we can compute the size of <italic>Q<subscrpt>i</subscrpt></italic>, where <italic>i</italic> is chosen randomly. We consider the performance of the algorithm on three special cases of the algorithm: join queries, transitive closure queries, and general recursive Datalog queries.

[1]  Patricia G. Selinger,et al.  Access path selection in a relational database management system , 1979, SIGMOD '79.

[2]  Robert Demolombe,et al.  Estimation of the Number of Tuples Satisfying a Query Expressed in Predicate Calculus Language , 1980, VLDB.

[3]  Neil C. Rowe,et al.  Top-down statistical estimation on a database , 1983, SIGMOD '83.

[4]  Stavros Christodoulakis,et al.  Estimating block transfers and join sizes , 1983, SIGMOD '83.

[5]  Gregory Piatetsky-Shapiro,et al.  Accurate estimation of the number of tuples satisfying a condition , 1984, SIGMOD '84.

[6]  Carlo Zaniolo,et al.  LDL: A Logic-Based Data Language , 1986, VLDB.

[7]  Jeffrey D. Ullman,et al.  Design Overview of the NAIL! System , 1986, ICLP.

[8]  Doron Rotem,et al.  Simple Random Sampling from Relational Databases , 1986, VLDB.

[9]  Catriel Beeri,et al.  Bounds on the propagation of selection into logic programs , 1987, PODS '87.

[10]  Yatin P. Saraiya,et al.  YAWN! (Yet Another Window on NAIL!) , 1987, IEEE Data Eng. Bull..

[11]  Wen-Chi Hou,et al.  Statistical estimators for relational algebra expressions , 1988, PODS '88.

[12]  Clifford A. Lynch,et al.  Selectivity Estimation and Query Optimization in Large Databases with Highly Skewed Distribution of Column Values , 1988, VLDB.

[13]  Jeffrey F. Naughton,et al.  Argument Reduction by Factoring , 1989, Theor. Comput. Sci..

[14]  Jeffrey F. Naughton,et al.  Estimating the Size of Generalized Transitive Closures , 1989, VLDB.