Shrinked Data Marts Enabled for Negative Caching

Data marts storing pre-aggregated data, prepared for further roll-ups, play an essential role in data warehouse environments and lead to significant performance gains in the query evaluation. However, in order to ensure the completeness of query results on the data mart without to access the underlying data warehouse, null values need to be stored explicitly; this process is denoted as negative caching. Such null values typically occur in multidimensional data sets, which are naturally very sparse. To our knowledge, there is no work on shrinking the null tuples in a multi-dimensional data set within ROLAP. For these tuples, we propose a lossless compression technique, leading to a dramatic reduction in size of the data mart. Queries depending on null value information can be answered with 100% precision by partially inflating the shrunken data mart. We complement our analytical approach with an experimental evaluation using real and synthetic data sets, and demonstrate our results

[1]  Anthony K. H. Tung,et al.  ItCompress: an iterative semantic compression algorithm , 2004, Proceedings. 20th International Conference on Data Engineering.

[2]  Hongjun Lu,et al.  Condensed cube: an effective approach to reducing data cube size , 2002, Proceedings 18th International Conference on Data Engineering.

[3]  David A. Huffman,et al.  A method for the construction of minimum-redundancy codes , 1952, Proceedings of the IRE.

[4]  Meikel Pöss,et al.  Data Compression in Oracle , 2003, VLDB.

[5]  Laks V. S. Lakshmanan,et al.  Quotient Cube: How to Summarize the Semantics of a Data Cube , 2002, VLDB.

[6]  Laks V. S. Lakshmanan,et al.  MDL Summarization with Holes , 2005, VLDB.

[7]  Abraham Lempel,et al.  A universal algorithm for sequential data compression , 1977, IEEE Trans. Inf. Theory.

[8]  Yannis Sismanis,et al.  Dwarf: shrinking the PetaCube , 2002, SIGMOD '02.

[9]  Wolfgang Lehner,et al.  Optimistic Coarse-Grained Cache Semantics for Data Marts , 2006, 18th International Conference on Scientific and Statistical Database Management (SSDBM'06).

[10]  Raghu Ramakrishnan,et al.  Bottom-up computation of sparse and Iceberg CUBE , 1999, SIGMOD '99.

[11]  Jeffrey F. Naughton,et al.  An array-based algorithm for simultaneous multidimensional aggregates , 1997, SIGMOD '97.

[12]  Yannis Sismanis,et al.  The Dwarf Data Cube Eliminates the Highy Dimensionality Curse , 2003 .

[13]  Laks V. S. Lakshmanan,et al.  QC-trees: an efficient summary structure for semantic OLAP , 2003, SIGMOD '03.

[14]  Rajeev Rastogi,et al.  SPARTAN: a model-based semantic compression system for massive data tables , 2001, SIGMOD '01.

[15]  D. Huffman A Method for the Construction of Minimum-Redundancy Codes , 1952 .

[16]  Arie Shoshani,et al.  A Compression Technique for Large Statistical Data-Bases , 1981, VLDB.

[17]  Shojiro Nishio,et al.  Database Compression with Data Mining Methods , 1998, FODO.