ARCube: supporting ranking aggregate queries in partially materialized data cubes

Supporting ranking queries in database systems has been a popular research topic recently. However, there is a lack of study on supporting ranking queries in data warehouses where ranking is on multidimensional aggregates instead of on measures of base facts. To address this problem, we propose a query execution model to answer different types of ranking aggregate queries based on a unified, partial cube structure, ARCube. The query execution model follows a candidate generation and verification framework, where the most promising candidate cells are generated using a set of high-level guiding cells. We also identify a bounding principle for effective pruning: once a guiding cell is pruned, all of its children candidate cells can be pruned. We further address the problem of efficient online candidate aggregation and verification by developing a chunk-based execution model to verify a bulk of candidates within a bounded memory buffer. Our extensive performance study shows that the new framework not only leads to an order of magnitude performance improvements over the state-of-the-art method, but also is much more flexible in terms of the types of ranking aggregate queries supported.

[1]  Hua-Gang Li,et al.  Progressive Ranking of Range Aggregates , 2005, DaWaK.

[2]  Surajit Chaudhuri,et al.  An overview of data warehousing and OLAP technology , 1997, SGMD.

[3]  Moni Naor,et al.  Optimal aggregation algorithms for middleware , 2001, PODS.

[4]  Oliver Günther,et al.  Multidimensional access methods , 1998, CSUR.

[5]  Ian H. Witten,et al.  Managing Gigabytes: Compressing and Indexing Documents and Images , 1999 .

[6]  Jiawei Han,et al.  Answering top-k queries with multi-dimensional selections: the ranking cube approach , 2006, VLDB.

[7]  Gerhard Weikum,et al.  IO-Top-k: index-access optimized top-k query processing , 2006, VLDB.

[8]  Dimitrios Gunopulos,et al.  Answering top-k queries using views , 2006, VLDB.

[9]  Yannis Sismanis,et al.  Dwarf: shrinking the PetaCube , 2002, SIGMOD '02.

[10]  Kevin Chen-Chuan Chang,et al.  RankSQL: query algebra and optimization for relational top-k queries , 2005, SIGMOD '05.

[11]  Rajeev Motwani,et al.  Computing Iceberg Queries Efficiently , 1998, VLDB.

[12]  Jeffrey F. Naughton,et al.  Materialized View Selection for Multidimensional Datasets , 1998, VLDB.

[13]  Kevin Chen-Chuan Chang,et al.  Supporting ad-hoc ranking aggregates , 2006, SIGMOD Conference.

[14]  Alfons Kemper,et al.  Exploiting early sorting and early partitioning for decision support query processing , 2000, The VLDB Journal.

[15]  Seung-won Hwang,et al.  Minimal probing: supporting expensive predicates for top-k queries , 2002, SIGMOD '02.

[16]  Luis Gravano,et al.  Evaluating top-k queries over Web-accessible databases , 2002, Proceedings 18th International Conference on Data Engineering.

[17]  Jeffrey D. Ullman,et al.  Implementing data cubes efficiently , 1996, SIGMOD '96.

[18]  Ian H. Witten,et al.  Managing gigabytes (2nd ed.): compressing and indexing documents and images , 1999 .

[19]  Hamid Pirahesh,et al.  Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals , 1996, Data Mining and Knowledge Discovery.

[20]  Ralf Rantzau,et al.  Context-sensitive ranking , 2006, SIGMOD Conference.

[21]  Michael J. Carey,et al.  On saying “Enough already!” in SQL , 1997, SIGMOD '97.

[22]  Walid G. Aref,et al.  Rank-aware query optimization , 2004, SIGMOD '04.

[23]  Jian Pei,et al.  Efficiently Answering Top-k Typicality Queries on Large Databases , 2007, VLDB.

[24]  Jian Pei,et al.  Efficient computation of Iceberg cubes with complex measures , 2001, SIGMOD '01.

[25]  Michael Stonebraker,et al.  Efficient organization of large multidimensional arrays , 1994, Proceedings of 1994 IEEE 10th International Conference on Data Engineering.

[26]  Jeffrey F. Naughton,et al.  An array-based algorithm for simultaneous multidimensional aggregates , 1997, SIGMOD '97.

[27]  Jeffrey F. Naughton,et al.  Caching multidimensional queries using chunks , 1998, SIGMOD '98.

[28]  Jiawei Han,et al.  Star-Cubing: Computing Iceberg Cubes by Top-Down and Bottom-Up Integration , 2003, Very Large Data Bases Conference.

[29]  Laks V. S. Lakshmanan,et al.  Quotient Cube: How to Summarize the Semantics of a Data Cube , 2002, VLDB.

[30]  Raghu Ramakrishnan,et al.  Bottom-up computation of sparse and Iceberg CUBE , 1999, SIGMOD '99.

[31]  Jiawei Han,et al.  High-Dimensional OLAP: A Minimal Cubing Approach , 2004, VLDB.