Query Size Estimation by Adaptive Sampling

Abstract We present an adaptive, random sampling algorithm for estimating the size of general queries. The algorithm can be used for any query D over a database D such that (1) for some n, the answer to L can be partitioned into n disjoint subsets L 1, L 2, ..., L n, and (2) for 1 ≤ i ≤ n, the size of L i, is bounded by some function b(D, L ), and (3) there is some algorithm by which we can compute the size of L i, where i is chosen randomly. We consider the performance of the algorithm on three special cases of the algorithm: join queries, transitive closure queries, and general recursive Datalog queries.