Efficient incremental maintenance of data cubes

The data cube provides users with aggregated results that are group-bys for all possible combinations of dimension attributes. When the number of dimension attributes is n, the data cube computes 2n group-bys, each of which is called a cuboid. A data cube is often precomputed and stored as materialized views in data warehouses. The data cube needs to be updated when source relations change. The incremental maintenance of a data cube is to compute and propagate only changes of source relations rather than recompute the entire data cube from the source relations.To maintain a data cube incrementally, previous methods compute a delta cube which represents the change of the data cube. We call a cuboid in a delta cube a delta cuboid. For a data cube with 2n cuboids, a delta cube consists of 2n delta cuboids. Thus, as the number of dimension attributes increases, the cost of computing the delta cube increases significantly. In this paper, we propose an incremental maintenance method for data cubes that can maintain a data cube by using only (n ⌈n/2⌉) delta cuboids. As a result, the cost of computing delta cuboids is substantially reduced. Through various experiments, we show the performance advantages of our method over the previous methods.

[1]  Jeffrey F. Naughton,et al.  On the Computation of Multidimensional Aggregates , 1996, VLDB.

[2]  Zhimin Chen,et al.  Efficient computation of multiple group by queries , 2005, SIGMOD '05.

[3]  Inderpal Singh Mumick,et al.  Maintenance of data cubes and summary tables in a warehouse , 1997, SIGMOD '97.

[4]  Kenneth A. Ross,et al.  Fast Computation of Sparse Datacubes , 1997, VLDB.

[5]  Alberto O. Mendelzon,et al.  Maintaining data cubes under dimension updates , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[6]  V. S. Subrahmanian,et al.  Maintaining views incrementally , 1993, SIGMOD Conference.

[7]  Dallan Quass,et al.  Maintenance Expressions for Views with Aggregation , 1996, VIEWS.

[8]  Nick Roussopoulos,et al.  Cubetree: organization of and bulk incremental updates on the data cube , 1997, SIGMOD '97.

[9]  Jian Pei,et al.  Efficient computation of Iceberg cubes with complex measures , 2001, SIGMOD '01.

[10]  Divyakant Agrawal,et al.  Range cube: efficient cube computation by exploiting data correlation , 2004, Proceedings. 20th International Conference on Data Engineering.

[11]  Hamid Pirahesh,et al.  Maintenance of cube automatic summary tables , 2000, SIGMOD 2000.

[12]  Raghu Ramakrishnan,et al.  Bottom-up computation of sparse and Iceberg CUBE , 1999, SIGMOD '99.

[13]  Kyuseok Shim,et al.  Including Group-By in Query Optimization , 1994, VLDB.

[14]  Jeffrey D. Ullman,et al.  Implementing data cubes efficiently , 1996, SIGMOD '96.

[15]  Yannis Kotidis,et al.  Aggregate view management in data warehouses , 2002 .

[16]  Hamid Pirahesh,et al.  Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals , 1996, Data Mining and Knowledge Discovery.

[17]  Goetz Graefe,et al.  Query evaluation techniques for large databases , 1993, CSUR.

[18]  Inderpal Singh Mumick,et al.  Incremental maintenance of aggregate and outerjoin expressions , 2006, Inf. Syst..

[19]  R. Graham,et al.  The steiner problem in phylogeny is NP-complete , 1982 .

[20]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .