Antisampling for Estimation: An Overview

We survey a new way to get quick estimates of the values of simple statistks (like count, mean, standard deviation, maximum, median, and mode frequency) on a large data set. This approach is a comprehensive attempt (apparently the first) to estimate statistics without any sampling. Our "antisampling" techniques have analogies to those of sampling, and exhibit similar estimation accuracy, but can be done much faster than sampling with large computer databases. Antisampling exploits computer science ideas from database theory and expert systems, building an auxiliary structure called a "database abstract." We make detailed comparisons to several different kinds of sampling.

[1]  John L. McCarthy,et al.  Metadata Management for Large Statistical Databases , 1982, VLDB.

[2]  John W. Tukey,et al.  Exploratory Data Analysis. , 1979 .

[3]  Robert G. Tobey Symbolic mathematical computation—introduction and overview , 1971, SYMSAC '71.

[4]  Prashant Palvia,et al.  Approximating Block Accesses in Database Organizations , 1984, Inf. Process. Lett..

[5]  Randall Davis,et al.  An overview of production systems , 1975 .

[6]  Gio Wiederhold,et al.  Estimating block accesses in database organizations: a closed noniterative formula , 1983, CACM.

[7]  Neil C. Rowe,et al.  Top-down statistical estimation on a database , 1983, SIGMOD '83.

[8]  S. Brendle,et al.  Calculus of Variations , 1927, Nature.

[9]  M. H. Hoyle,et al.  Transformations: An Introduction and a Bibliography , 1973 .

[10]  Ezio Lefons,et al.  An Analytic Approach to Statistical Databases , 1983, VLDB.

[11]  R. Johnson,et al.  Properties of cross-entropy minimization , 1981, IEEE Trans. Inf. Theory.

[12]  Jacob Paul Morgenstein Computer based management information systems embodying answer accuracy as a user parameter , 1981 .

[13]  Eugene C. Freuder Synthesizing constraint expressions , 1978, CACM.

[14]  Chak-Kuen Wong,et al.  An Efficient Method for Weighted Sampling Without Replacement , 1980, SIAM J. Comput..

[15]  Neil C. Rowe Rule/based Statistical Calculations on a "Database Abstract" , 1981, SSDBM.

[16]  S. B. Yao,et al.  Approximating block accesses in database organizations , 1977, CACM.

[17]  I. Bross Sample Survey Methods and Theory. Volume I. Methods and Applications.Morris H. Hansen , William N. Hurwitz , William G. Madow , 1954 .