Efficient Aggregation Algorithms for Compressed Data Warehouses

Aggregation and cube are important operations for online analytical processing (OLAP). Many efficient algorithms to compute aggregation and cube for relational OLAP have been developed. Some work has been done on efficiently computing cube for multidimensional data warehouses that store data sets in multidimensional arrays rather than in tables. However, to our knowledge, there is nothing to date in the literature describing aggregation algorithms on compressed data warehouses for multidimensional OLAP. This paper presents a set of aggregation algorithms on compressed data warehouses for multidimensional OLAP. These algorithms operate directly on compressed data sets, which are compressed by the mapping-complete compression methods, without the need to first decompress them. The algorithms have different performance behaviors as a function of the data set parameters, sizes of outputs and main memory availability. The algorithms are described and the I/O and CPU cost functions are presented in this paper. A decision procedure to select the most efficient algorithm for a given aggregation request is also proposed. The analysis and experimental results show that the algorithms have better performance on sparse data than the previous aggregation algorithms.

[1]  Jeffrey D. Ullman,et al.  Index selection for OLAP , 1997, Proceedings 13th International Conference on Data Engineering.

[2]  Nick Roussopoulos,et al.  Cubetree: Organization of and Bulk Updates on the Data Cube , 1997, SIGMOD Conference.

[3]  Arie Shoshani,et al.  Statistical Databases: Characteristics, Problems, and some Solutions , 1982, VLDB.

[4]  Jianzhong Li,et al.  Batched Interpolation Searching on databases , 1987, 1987 IEEE Third International Conference on Data Engineering.

[5]  Nick Roussopoulos,et al.  An alternative storage organization for ROLAP aggregate views based on cubetrees , 1998, SIGMOD '98.

[6]  Jianzhong Li,et al.  A New Compression Method with Fast Searching on Large Databases , 1987, VLDB.

[7]  Jeffrey F. Naughton,et al.  An array-based algorithm for simultaneous multidimensional aggregates , 1997, SIGMOD '97.

[8]  Goetz Graefe,et al.  Query evaluation techniques for large databases , 1993, CSUR.

[9]  Meng Chang Chen,et al.  On the Data Model and Access Method of Summary Data Management , 1989, IEEE Trans. Knowl. Data Eng..

[10]  George Colliat,et al.  OLAP, relational, and multidimensional database systems , 1996, SGMD.

[11]  Jeffrey D. Ullman,et al.  Implementing data cubes efficiently , 1996, SIGMOD '96.

[12]  Jaideep Srivastava,et al.  Aggregation Algorithms for Very Large Compressed Data Warehouses , 1999, VLDB.

[13]  Mark A. Roth,et al.  Database compression , 1993, SGMD.

[14]  Hamid Pirahesh,et al.  Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals , 1996, Data Mining and Knowledge Discovery.

[15]  Jaideep Srivastava,et al.  TBSAM: An Access Method for Efficient Processing of Statistical Queries , 1989, IEEE Trans. Knowl. Data Eng..

[16]  Arie Shoshani,et al.  Efficient Access of Compressed Data , 1980, VLDB.

[17]  Kenneth A. Ross,et al.  Querying Multiple Features of Groups in Relational Databases , 1996, VLDB.

[18]  Mostafa A. Bassiouni,et al.  Data Compression in Scientific and Statistical Databases , 1985, IEEE Transactions on Software Engineering.