The summarizability of OLAP (online analytical processing) and statistical databases is an a extremely important property, because violating this condition can lead to erroneous conclusions and decisions. In this paper, we explore the conditions for summarizability. We introduce a framework for precisely specifying the context in which statistical objects are defined. We use a three-step process to define normalized statistical objects. Using this framework, we identify three necessary conditions for summarizability. We provide specific tests for each of the conditions that can be verified either from semantic knowledge or by checking the statistical database itself. We also provide the reasoning for our belief that these three summarizability conditions are sufficient as well.
[1]
A. Whittemore.
Collapsibility of Multidimensional Contingency Tables
,
1978
.
[2]
Jeffrey D. Ullman,et al.
Implementing data cubes efficiently
,
1996,
SIGMOD '96.
[3]
Arie Shoshani,et al.
STORM: A Statistical Object Representation Model
,
1990,
IEEE Data Eng. Bull..
[4]
David Maier,et al.
The Theory of Relational Databases
,
1983
.
[5]
E. H. Simpson,et al.
The Interpretation of Interaction in Contingency Tables
,
1951
.
[6]
Arie Shoshani,et al.
Statistical Databases: Characteristics, Problems, and some Solutions
,
1982,
VLDB.