An efficient method for maintaining data cubes incrementally

The data cube operator computes group-bys for all possible combinations of a set of dimension attributes. Since computing a data cube typically incurs a considerable cost, the data cube is often precomputed and stored as materialized views in data warehouses. A materialized data cube needs to be updated when the source relations are changed. The incremental maintenance of a data cube is to compute and propagate only its changes, rather than recompute the entire data cube from scratch. For n dimension attributes, the data cube consists of 2^n group-bys, each of which is called a cuboid. To incrementally maintain a data cube with 2^n cuboids, the conventional methods compute 2^ndelta cuboids, each of which represents the change of a cuboid. In this paper, we propose an efficient incremental maintenance method that can maintain a data cube using only a subset of 2^n delta cuboids. We formulate an optimization problem to find the optimal subset of 2^n delta cuboids that minimizes the total maintenance cost, and propose a heuristic solution that allows us to maintain a data cube using only n@?n/2@? delta cuboids. As a result, the cost of maintaining a data cube is substantially reduced. Through various experiments, we show the performance advantages of the proposed method over the conventional methods. We also extend the proposed method to handle partially materialized cubes and dimension hierarchies.

[1]  Hongjun Lu,et al.  Condensed cube: an effective approach to reducing data cube size , 2002, Proceedings 18th International Conference on Data Engineering.

[2]  Inderpal Singh Mumick,et al.  Incremental maintenance of aggregate and outerjoin expressions , 2006, Inf. Syst..

[3]  Elke A. Rundensteiner,et al.  GPIVOT: efficient incremental maintenance of complex ROLAP views , 2005, 21st International Conference on Data Engineering (ICDE'05).

[4]  Goetz Graefe,et al.  Query evaluation techniques for large databases , 1993, CSUR.

[5]  Jeffrey F. Naughton,et al.  On the Computation of Multidimensional Aggregates , 1996, VLDB.

[6]  Kenneth A. Ross,et al.  Fast Computation of Sparse Datacubes , 1997, VLDB.

[7]  Kyuseok Shim,et al.  Including Group-By in Query Optimization , 1994, VLDB.

[8]  Laks V. S. Lakshmanan,et al.  What-if OLAP Queries with Changing Dimensions , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[9]  R. Graham,et al.  The steiner problem in phylogeny is NP-complete , 1982 .

[10]  Hongyan Liu,et al.  C-Cubing: Efficient Computation of Closed Cubes by Aggregation-Based Checking , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[11]  Gottfried Vossen,et al.  Multidimensional normal forms for data warehouse design , 2003, Inf. Syst..

[12]  Jian Pei,et al.  Efficient computation of Iceberg cubes with complex measures , 2001, SIGMOD '01.

[13]  Hamid Pirahesh,et al.  Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals , 1996, Data Mining and Knowledge Discovery.

[14]  Jiawei Han,et al.  Computing Iceberg Cubes by Top-Down and Bottom-Up Integration: The StarCubing Approach , 2007 .

[15]  Raghu Ramakrishnan,et al.  Bottom-up computation of sparse and Iceberg CUBE , 1999, SIGMOD '99.

[16]  Yannis Kotidis,et al.  Aggregate view management in data warehouses , 2002 .

[17]  Zhimin Chen,et al.  Efficient computation of multiple group by queries , 2005, SIGMOD '05.

[18]  Inderpal Singh Mumick,et al.  Maintenance of data cubes and summary tables in a warehouse , 1997, SIGMOD '97.

[19]  Hamid Pirahesh,et al.  Maintenance of cube automatic summary tables , 2000, SIGMOD 2000.

[20]  Leonid Libkin,et al.  An Improved Algorithm for the Incremental Recomputation of Active Relational Expressions , 1997, IEEE Trans. Knowl. Data Eng..

[21]  Myoung-Ho Kim,et al.  Efficient incremental maintenance of data cubes , 2006, VLDB.

[22]  Laks V. S. Lakshmanan,et al.  Quotient Cube: How to Summarize the Semantics of a Data Cube , 2002, VLDB.

[23]  Yannis Sismanis,et al.  The Complexity of Fully Materialized Coalesced Cubes , 2004, VLDB.

[24]  Arie Shoshani,et al.  Summarizability in OLAP and statistical data bases , 1997, Proceedings. Ninth International Conference on Scientific and Statistical Database Management (Cat. No.97TB100150).

[25]  Ping-Yu Hsu,et al.  Simultaneous determination of view selection and update policy with stochastic query and response time constraints , 2008, Inf. Sci..

[26]  Divyakant Agrawal,et al.  Range cube: efficient cube computation by exploiting data correlation , 2004, Proceedings. 20th International Conference on Data Engineering.

[27]  Kenneth Steiglitz,et al.  Combinatorial Optimization: Algorithms and Complexity , 1981 .

[28]  Jeffrey D. Ullman,et al.  Implementing data cubes efficiently , 1996, SIGMOD '96.

[29]  Don-Lin Yang,et al.  Efficient approaches for materialized views selection in a data warehouse , 2007, Inf. Sci..

[30]  Yannis Sismanis,et al.  Dwarf: shrinking the PetaCube , 2002, SIGMOD '02.

[31]  Jeffrey F. Naughton,et al.  Sampling-Based Estimation of the Number of Distinct Values of an Attribute , 1995, VLDB.

[32]  Alberto O. Mendelzon,et al.  Maintaining data cubes under dimension updates , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).