OLAP and statistical databases: similarities and differences

During the 1980’s there was a lot of activity in the area of Statistical Databases, focusing mostly on socioeconomic type applications, such as census data, national production and consumption patterns, etc. Tn the 1990’s the area of On-LineAnalytic Processing (OLAP) was introduced for the analysis of transaction based business data, such as retail stores transactions. Both areas deal with the representation and support of data in a multi-dimensional space. Much of the OLAP literature does not refer to the Statistical Database literature, perhaps because the connection between analyzing business data and socioeconomic data is not obvious. Furthermore, there are papers published in one area or the other whose results can be applied in both application areas. In this paper, we compare the work done in these two areas. We discuss concepts used in the conceptual modeling of the data and operations over them, efficient physical organization and access methods, as well as privacy issues. We point out the terminology used and the correspondence between terms. We identify which research aspects are emphasized in each of these areas and the reasons for that We conclude by arguing for the support of a Statistical Object data type as one of the fundamental structures that object-oriented data models and systems should support

[1]  Arie Shoshani,et al.  On the Semantic Completeness of Macro-Data Operators for Statistical Aggregation , 1992, SSDBM.

[2]  Z. Meral Ozsoyoglu,et al.  Statistical Databases , 1984, VLDB.

[3]  Sunita Sarawagi,et al.  Modeling multidimensional databases , 1997, Proceedings 13th International Conference on Data Engineering.

[4]  M. J. Turner,et al.  A DBMS For Large Statistical Databases , 1979, Fifth International Conference on Very Large Data Bases, 1979..

[5]  Jeffrey D. Ullman,et al.  Implementing data cubes efficiently , 1996, SIGMOD '96.

[6]  Arie Shoshani,et al.  SUBJECT: A Directory Driven System for Organizing and Accessing Large Statistical Databases , 1981, VLDB.

[7]  Michael Stonebraker,et al.  Efficient organization of large multidimensional arrays , 1994, Proceedings of 1994 IEEE 10th International Conference on Data Engineering.

[8]  Jeffrey F. Naughton,et al.  An array-based algorithm for simultaneous multidimensional aggregates , 1997, SIGMOD '97.

[9]  Arie Shoshani,et al.  STORM: A Statistical Object Representation Model , 1990, IEEE Data Eng. Bull..

[10]  Arie Shoshani,et al.  Efficient organization and access of multi-dimensional datasets on tertiary storage systems , 1995, Inf. Syst..

[11]  Dorothy E. Denning,et al.  A fast procedure for finding a tracker in a statistical database , 1980, TODS.

[12]  Nick Roussopoulos,et al.  Cubetree: Organization of and Bulk Updates on the Data Cube , 1997, SIGMOD Conference.

[13]  J. Leon Zhao,et al.  Extendible arrays for statistical databases and OLAP applications , 1996, Proceedings of 8th International Conference on Scientific and Statistical Data Base Management.

[14]  Arie Shoshani,et al.  A Compression Technique for Large Statistical Data-Bases , 1981, VLDB.

[15]  Gultekin Özsoyoglu,et al.  A language and a physical organization technique for summary tables , 1985, SIGMOD Conference.

[16]  Nick Roussopoulos,et al.  Cubetree: organization of and bulk incremental updates on the data cube , 1997, SIGMOD '97.

[17]  Wolfgang Lehner,et al.  CROSS-DB: a feature-extended multidimensional data model for statistical and scientific databases , 1996, CIKM '96.

[18]  Doron Rotem,et al.  Bit Transposed Files , 1985, VLDB.

[19]  Arie Shoshani,et al.  Summarizability in OLAP and statistical data bases , 1997, Proceedings. Ninth International Conference on Scientific and Statistical Database Management (Cat. No.97TB100150).

[20]  Doron Rotem,et al.  Random sampling from databases: a survey , 1995 .

[21]  Arie Shoshani,et al.  Statistical Databases: Characteristics, Problems, and some Solutions , 1982, VLDB.