Supporting ad-hoc ranking aggregates

This paper presents a principled framework for efficient processing of ad-hoc top-k (ranking) aggregate queries, which provide the k groups with the highest aggregates as results. Essential support of such queries is lacking in current systems, which process the queries in a naïve materialize-group-sort scheme that can be prohibitively inefficient. Our framework is based on three fundamental principles. The Upper-Bound Principle dictates the requirements of early pruning, and the Group-Ranking and Tuple-Ranking Principles dictate group-ordering and tuple-ordering requirements. They together guide the query processor toward a provably optimal tuple schedule for aggregate query processing. We propose a new execution framework to apply the principles and requirements. We address the challenges in realizing the framework and implementing new query operators, enabling efficient group-aware and rank-aware query plans. The experimental study validates our framework by demonstrating orders of magnitude performance improvement in the new query plans, compared with the traditional plans.

[1]  Werner Nutt,et al.  Rewriting aggregate queries using views , 1999, PODS.

[2]  Luis Gravano,et al.  Evaluating Top-k Selection Queries , 1999, VLDB.

[3]  Ronald Fagin,et al.  Combining Fuzzy Information from Multiple Systems , 1999, J. Comput. Syst. Sci..

[4]  Luis Gravano,et al.  Evaluating top-k queries over Web-accessible databases , 2002, Proceedings 18th International Conference on Data Engineering.

[5]  Jeffrey D. Ullman,et al.  Index selection for OLAP , 1997, Proceedings 13th International Conference on Data Engineering.

[6]  Kenneth A. Ross,et al.  Fast Computation of Sparse Datacubes , 1997, VLDB.

[7]  Timos K. Sellis,et al.  The Generalized Pre-Grouping Transformation: Aggregate-Query Optimization in the Presence of Dependencies , 2003, VLDB.

[8]  Guido Moerkotte,et al.  A Combined Framework for Grouping and Order Optimization , 2004, VLDB.

[9]  Rajeev Motwani,et al.  Computing Iceberg Queries Efficiently , 1998, VLDB.

[10]  Ronald Fagin,et al.  Combining fuzzy information from multiple systems (extended abstract) , 1996, PODS.

[11]  Kevin Chen-Chuan Chang,et al.  Efficient Processing of Ad-Hoc Top-k Aggregate Queries in OLAP , 2005 .

[12]  Kevin Chen-Chuan Chang,et al.  RankSQL: query algebra and optimization for relational top-k queries , 2005, SIGMOD '05.

[13]  Per-Åke Larson,et al.  Eager Aggregation and Lazy Aggregation , 1995, VLDB.

[14]  Raghu Ramakrishnan,et al.  Probabilistic Optimization of Top N Queries , 1999, VLDB.

[15]  Raghu Ramakrishnan,et al.  Bottom-up computation of sparse and Iceberg CUBE , 1999, SIGMOD '99.

[16]  Hua-Gang Li,et al.  Ranking Aggregates , 2004 .

[17]  Per-Ake Larson,et al.  Performing Group-By before Join , 1994, ICDE 1994.

[18]  Michael J. Carey,et al.  On saying “Enough already!” in SQL , 1997, SIGMOD '97.

[19]  Walid G. Aref,et al.  Rank-aware query optimization , 2004, SIGMOD '04.

[20]  Ashish Gupta,et al.  Aggregate-Query Processing in Data Warehousing Environments , 1995, VLDB.

[21]  Vasilis Vassalos,et al.  MiniCount: Efficient Rewriting of COUNT-Queries Using Views , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[22]  Gerhard Weikum,et al.  Integrating DB and IR Technologies: What is the Sound of One Hand Clapping? , 2005, CIDR.

[23]  Seung-won Hwang,et al.  Minimal probing: supporting expensive predicates for top-k queries , 2002, SIGMOD '02.

[24]  Kyuseok Shim,et al.  Including Group-By in Query Optimization , 1994, VLDB.

[25]  Jeffrey D. Ullman,et al.  Implementing data cubes efficiently , 1996, SIGMOD '96.

[26]  Moni Naor,et al.  Optimal aggregation algorithms for middleware , 2001, PODS.

[27]  Eugene J. Shekita,et al.  Fundamental techniques for order optimization , 1996, SIGMOD '96.

[28]  Alfons Kemper,et al.  Exploiting early sorting and early partitioning for decision support query processing , 2000, The VLDB Journal.

[29]  Jeffrey F. Naughton,et al.  An array-based algorithm for simultaneous multidimensional aggregates , 1997, SIGMOD '97.

[30]  Jeffrey F. Naughton,et al.  On the Computation of Multidimensional Aggregates , 1996, VLDB.

[31]  Jian Pei,et al.  Efficient computation of Iceberg cubes with complex measures , 2001, SIGMOD '01.

[32]  Divesh Srivastava,et al.  Answering Queries with Aggregation Using Views , 1996, VLDB.

[33]  Hamid Pirahesh,et al.  Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals , 1996, Data Mining and Knowledge Discovery.

[34]  Helen J. Wang,et al.  Online aggregation , 1997, SIGMOD '97.

[35]  Rada Chirkova,et al.  Selecting and Using Views to Compute Aggregate Queries (Extended Abstract) , 2005, ICDT.

[36]  Peter J. Haas,et al.  Ripple joins for online aggregation , 1999, SIGMOD '99.