Analysis and application of adaptive sampling

An estimation algorithm for a query is a probabilistic algorithm that computes an approximation for the size (number of tuples) of the query. One class cf estimation algorithms uses a form of statistical sampling known as adaptive sampling. Several versions of adaptive sampling have been developed by other researchers. The original version has been surpassed in some ways by a newer version and a more specialized Monte-Carlo algorithm. An analysis of the cost of the original version is presented, and the different algorithms are compared. The analysis is used to derive an upper bound on the number of samples required by the original algorithm. Also, contrary to what seems to be a commonly held opinion, none of the algorithms is generally better than the other two. Which algorithm is superior depends on the query being estimated and the criteria that are being applied. Another question that is studied is which classes of logically definable queries have fast estimation algorithms. Evidence from descriptive complexity theory is provided that indicates not all such queries have fast estimation algorithms. However, it is shown that on classes of structures of bounded degree, all first-order queries have fast estimation algorithms.

[1]  Jeffrey F. Naughton,et al.  Estimating the Size of Generalized Transitive Closures , 1989, VLDB.

[2]  A. Gut Stopped Random Walks: Limit Theorems and Applications , 1987 .

[3]  Edith Cohen,et al.  Size-Estimation Framework with Applications to Transitive Closure and Reachability , 1997, J. Comput. Syst. Sci..

[4]  Jeffrey F. Naughton,et al.  Query Size Estimation by Adaptive Sampling , 1995, J. Comput. Syst. Sci..

[5]  Martin Grohe,et al.  Deciding First-Order Properties of Locally Tree-Decomposalbe Graphs , 1999, ICALP.

[6]  A. C. Berry The accuracy of the Gaussian approximation to the sum of independent variates , 1941 .

[7]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[8]  P. A. P. Moran,et al.  An introduction to probability theory , 1968 .

[9]  Osamu Watanabe,et al.  Simple Sampling Techniques for Discovery Science , 2000 .

[10]  Serge Abiteboul,et al.  Queries are easier than you thought (probably) , 1992, PODS '92.

[11]  Lauri Hella,et al.  Notions of Locality and Their Logical Characterizations over Finite Models , 1999, J. Symb. Log..

[12]  H. Chernoff A Measure of Asymptotic Efficiency for Tests of a Hypothesis Based on the sum of Observations , 1952 .

[13]  Judah Rosenblatt,et al.  PROBABILITY AND STATISTICS , 2016 .

[14]  Peter J. Haas,et al.  Sequential sampling procedures for query size estimation , 1992, SIGMOD '92.

[15]  A. Wald On Cumulative Sums of Random Variables , 1944 .

[16]  William Feller,et al.  An Introduction to Probability Theory and Its Applications , 1967 .

[17]  Jörg Flum,et al.  Finite model theory , 1995, Perspectives in Mathematical Logic.

[18]  Torben Hagerup,et al.  A Guided Tour of Chernoff Bounds , 1990, Inf. Process. Lett..

[19]  Walter Bartky,et al.  Multiple Sampling with Constant Probability , 1943 .