Absolute Bounds on Set Intersection and Union Sizes from Distribution Information

A catalog of quick closed-form bounds on set intersection and union sizes is presented; they can be expressed as rules, and managed by a rule-based system architecture. These methods use a variety of statistics precomputed on the data, and exploit homomorphisms (onto mappings) of the data items onto distributions that can be more easily analyzed. The methods can be used anytime, but tend to work best when there are strong or complex correlations in the data. This circumstance is poorly handled by the standard independence-assumption and distributional-assumption estimates. >

[1]  Barry M. Tilden A hierarchy of knowledge levels implemented in a rule-based production system to calculate bounds on the size of intersection and unions of simple sets. , 1984 .

[2]  Ezio Lefons,et al.  An Analytic Approach to Statistical Databases , 1983, VLDB.

[3]  Eugene L. Lawler,et al.  An Approach to Multilevel Boolean Minimization , 1964, JACM.

[4]  Dennis G. Severance,et al.  A practitioner's guide to data base compression - Tutorial , 1983, Inf. Syst..

[5]  Neil C. Rowe Diophantine Inference on a Statistical Database , 1984, Inf. Process. Lett..

[6]  Neil C. Rowe,et al.  Antisampling for Estimation: An Overview , 1985, IEEE Transactions on Software Engineering.

[7]  Philippe Richard,et al.  Evaluation of the size of a query expressed in relational algebra , 1981, SIGMOD '81.

[8]  Gregory Piatetsky-Shapiro,et al.  Accurate estimation of the number of tuples satisfying a condition , 1984, SIGMOD '84.

[9]  John W. Tukey,et al.  Exploratory Data Analysis. , 1979 .

[10]  T. H. Merrett,et al.  Distribution Models Of Relations , 1979, Fifth International Conference on Very Large Data Bases, 1979..

[11]  Neil C. Rowe Rule/based Statistical Calculations on a "Database Abstract" , 1981, SSDBM.

[12]  Witold Lipski,et al.  On semantic issues connected with incomplete information databases , 1979, ACM Trans. Database Syst..

[13]  Robert Demolombe,et al.  Estimation of the Number of Tuples Satisfying a Query Expressed in Predicate Calculus Language , 1980, VLDB.

[14]  L. Cox Suppression Methodology and Statistical Disclosure Control , 1980 .

[15]  Dorothy E. Denning,et al.  Inference Controls for Statistical Databases , 1983, Computer.

[16]  Arie Shoshani,et al.  Statistical Databases: Characteristics, Problems, and some Solutions , 1982, VLDB.

[17]  Stavros Christodoulakis,et al.  Estimating record selectivities , 1983, Inf. Syst..