PnP: parallel and external memory iceberg cube computation

We present "Pipe 'n Prune" (PnP), a new hybrid method for iceberg-cube query computation. The novelty of our method is that it achieves a tight integration of top-down piping for data aggregation with bottom-up a priori data pruning. A particular strength of PnP is that it is very efficient for all of the following scenarios: (1) Sequential iceberg-cube queries. (2) External memory iceberg-cube queries. (3) Parallel iceberg-cube queries on shared-nothing PC clusters with multiple disks.

[1]  Jeffrey F. Naughton,et al.  On the Computation of Multidimensional Aggregates , 1996, VLDB.

[2]  Laks V. S. Lakshmanan,et al.  Quotient Cube: How to Summarize the Semantics of a Data Cube , 2002, VLDB.

[3]  Andrew Rau-Chaplin,et al.  Computing Partial Data Cubes for Parallel Data Warehousing Applications , 2001, PVM/MPI.

[4]  Kenneth A. Ross,et al.  Fast Computation of Sparse Datacubes , 1997, VLDB.

[5]  Masaru Kitsuregawa,et al.  A dynamic load balancing strategy for parallel datacube computation , 1999, DOLAP '99.

[6]  Andrew Rau-Chaplin,et al.  A cluster architecture for parallel data warehousing , 2001, Proceedings First IEEE/ACM International Symposium on Cluster Computing and the Grid.

[7]  Rajeev Motwani,et al.  Computing Iceberg Queries Efficiently , 1998, VLDB.

[8]  Raymond T. Ng,et al.  Iceberg-cube computation with PC clusters , 2001, SIGMOD '01.

[9]  Alok N. Choudhary,et al.  A parallel scalable infrastructure for OLAP and data mining , 1999, Proceedings. IDEAS'99. International Database Engineering and Applications Symposium (Cat. No.PR00265).

[10]  Sunita Sarawagi,et al.  On computing the data cube , 1996 .

[11]  Hamid Pirahesh,et al.  Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals , 1996, Data Mining and Knowledge Discovery.

[12]  Laks V. S. Lakshmanan,et al.  QC-trees: an efficient summary structure for semantic OLAP , 2003, SIGMOD '03.

[13]  Jeffrey F. Naughton,et al.  An array-based algorithm for simultaneous multidimensional aggregates , 1997, SIGMOD '97.

[14]  Hongjun Lu,et al.  Condensed cube: an effective approach to reducing data cube size , 2002, Proceedings 18th International Conference on Data Engineering.

[15]  Ying Chen,et al.  Parallel ROLAP Data Cube Construction on Shared-Nothing Multiprocessors , 2004, Distributed and Parallel Databases.

[16]  David J. DeWitt,et al.  Parallel database systems: the future of high performance database systems , 1992, CACM.

[17]  Raghu Ramakrishnan,et al.  Bottom-up computation of sparse and Iceberg CUBE , 1999, SIGMOD '99.

[18]  Jiawei Han,et al.  Star-Cubing: Computing Iceberg Cubes by Top-Down and Bottom-Up Integration , 2003, Very Large Data Bases Conference.

[19]  RamakrishnanRaghu,et al.  Bottom-up computation of sparse and Iceberg CUBE , 1999 .

[20]  Yannis Sismanis,et al.  Dwarf: shrinking the PetaCube , 2002, SIGMOD '02.

[21]  Nick Roussopoulos,et al.  Cubetree: organization of and bulk incremental updates on the data cube , 1997, SIGMOD '97.

[22]  Hongjun Lu,et al.  Fully Dynamic Partitioning: Handling Data Skew in Parallel Data Cube Computation , 2004, Distributed and Parallel Databases.

[23]  Ying Chen,et al.  Building large ROLAP data cubes in parallel , 2004, Proceedings. International Database Engineering and Applications Symposium, 2004. IDEAS '04..

[24]  Alok N. Choudhary,et al.  High performance multidimensional analysis of large datasets , 1998, DOLAP '98.

[25]  Alok N. Choudhary,et al.  High Performance OLAP and Data Mining on Parallel Computers , 1997, Data Mining and Knowledge Discovery.