Aggregate Evaluability in Statistical Databases

Usually a statistical database contains many summary tables representing the distribution of the same statistical variable over the classes of as many partitions of a certain universe of objects. Existing query systems allow only queries on single tables. Indeed, in most cases additional queries can be evaluated by combining the information contained in similar tables in a suitable way. attribute” [ 14,201) relat.ed to a given universe of 0bject.s or individuals, partitioned according to a set of (category) attributes, referred to as the scheme of the table. Example 1. Untuerse: Soviet people in the year 1959. Variable: Population (1000 individuals). Scheme: {Sex, Schooling, Part,y-Membership} (the data is obtained by processing data from Bishop et al. [4]). In order to improve the responsiveness of the database and allow an integrated use of the stored informat.ion, we propose to inform t,he database system of the relationship among the partitions adopted in the tables. Such a relationship, called zntersection dependency, states which classes of the partitions have a nonempty intersection and can be represented by a uniform multipartite hypergraph, called intersection hypergraph. On the grounds of the algebraic properties of the intel Jection hypergraph and under the assumption of data additivity, we shall provide a characteriration of evaluable queries, which allows us to define polynomial-time procedures both for testing evaluability and for evaluating queries. Table: Distribution of the soviet populatiion by schooling, sex and party (1000 individuals) 1959 Sex / Schooling Party-Membership Yes No

[1]  Samuel Kotz,et al.  Encyclopedia of Statistical Sciences. Volume 7: Plackett Family of Distributions-Regression, Wrong. , 1988 .

[2]  Arie Shoshani,et al.  Statistical Databases: Characteristics, Problems, and some Solutions , 1982, VLDB.

[3]  H. Sato Handling summary information in a database: derivability , 1981, SIGMOD '81.

[4]  Francesco M. Malvestuto Answering queries in categorical databases , 1987, PODS '87.

[5]  Stephen E. Fienberg,et al.  Discrete Multivariate Analysis: Theory and Practice , 1976 .

[6]  Anthony C. Klug Equivalence of Relational Algebra and Relational Calculus Query Languages Having Aggregate Functions , 1982, JACM.

[7]  Zdzislaw Pawlak,et al.  Using partitioned databases for statistical data analysis , 1981, AFIPS '81.

[8]  Maurizio Rafanelli,et al.  An Algebra for Statistical Data , 1986, SSDBM.

[9]  Garrett Birkhoff,et al.  A survey of modern algebra , 1942 .

[10]  Z. Meral Ozsoyoglu,et al.  An extension of relational algebra for summary tables , 1983 .

[11]  Francesco M. Malvestuto The derivation problem of summary data , 1988, SIGMOD '88.

[12]  Neil C. Rowe,et al.  Antisampling for Estimation: An Overview , 1985, IEEE Transactions on Software Engineering.

[13]  Francesco M. Malvestuto,et al.  The Classification Problem with Semantically Heterogeneous Data , 1988, SSDBM.

[14]  Gultekin Özsoyoglu,et al.  An Extension of Relational Algebra for Summary Tables , 1983, SSDBM.

[15]  Gultekin Özsoyoglu,et al.  Extending relational algebra and relational calculus with set-valued attributes and aggregate functions , 1987, TODS.

[16]  Claude Berge,et al.  Graphs and Hypergraphs , 2021, Clustering.

[17]  Arie Shoshani,et al.  Scientific and Statistical Data Management Research at LBL , 1986, SSDBM.

[18]  Georges Hébrail A Model of Summaries for Very Large Databases , 1986, SSDBM.

[19]  Sakti P. Ghosh Statistical relational tables for statistical database management , 1986, IEEE Transactions on Software Engineering.