Optimizing Iceberg Queries with Complex Joins

Iceberg queries, commonly used for decision support, find groups whose aggregate values are above or below a threshold. In practice, iceberg queries are often posed over complex joins that are expensive to evaluate. This paper proposes a framework for combining a number of techniques---a-priori, memoization, and pruning---to optimize iceberg queries with complex joins. A-priori pushes partial GROUP BY and HAVING condition before a join to reduce its input size. Memoization caches and reuses join computation results. Pruning uses cached results to infer that certain tuples cannot contribute to the final query result, and short-circuits join computation. We formally derive conditions for correctly applying these techniques. Our practical rewrite algorithm produces highly efficient SQL that can exploit combinations of optimization opportunities in ways previously not possible. We evaluate our PostgreSQL-based implementation experimentally and show that it outperforms both baseline PostgreSQL and a commercial database system.

[1]  Dimitris Papadias,et al.  Evaluation of Iceberg Distance Joins , 2003, SSTD.

[2]  Stanley B. Zdonik,et al.  Searchlight: Enabling Integrated Search and Exploration over Large Multidimensional Data , 2015, Proc. VLDB Endow..

[3]  Leonid Khachiyan Fourier-Motzkin Elimination Method , 2009, Encyclopedia of Optimization.

[4]  Sunita Sarawagi,et al.  Integrating Association Rule Mining with Relational Database Systems: Alternatives and Implications , 1998, SIGMOD '98.

[5]  Jan Chomicki,et al.  Skyline queries, front and back , 2013, SGMD.

[6]  Hamid Pirahesh,et al.  Magic is relevant , 1990, SIGMOD '90.

[7]  Raghu Ramakrishnan,et al.  Review - Magic Sets and Other Strange Ways to Implement Logic Programs , 1999, ACM SIGMOD Digit. Rev..

[8]  Per-Åke Larson,et al.  Eager Aggregation and Lazy Aggregation , 1995, VLDB.

[9]  Raghu Ramakrishnan,et al.  Bottom-up computation of sparse and Iceberg CUBE , 1999, SIGMOD '99.

[10]  Jun Yang,et al.  Perturbation Analysis of Database Queries , 2016, Proc. VLDB Endow..

[11]  Per-Ake Larson,et al.  Performing Group-By before Join , 1994, ICDE 1994.

[12]  Bernhard Seeger,et al.  Progressive skyline computation in database systems , 2005, TODS.

[13]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[14]  Chris Clifton,et al.  Query flocks: a generalization of association-rule mining , 1998, SIGMOD '98.

[15]  Anthony K. H. Tung,et al.  Finding k-dominant skylines in high dimensional space , 2006, SIGMOD Conference.

[16]  David Maier,et al.  Magic sets and other strange ways to implement logic programs (extended abstract) , 1985, PODS '86.

[17]  Per-Åke Larson,et al.  Performing group-by before join /spl lsqb/query processing/spl rsqb/ , 1994, Proceedings of 1994 IEEE 10th International Conference on Data Engineering.

[18]  Laks V. S. Lakshmanan,et al.  Exploratory mining and pruning optimizations of constrained associations rules , 1998, SIGMOD '98.

[19]  Hamid Pirahesh,et al.  Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals , 1996, Data Mining and Knowledge Discovery.

[20]  Hamid Pirahesh,et al.  Cost-based optimization for magic: algebra and implementation , 1996, SIGMOD '96.

[21]  Rajeev Motwani,et al.  Computing Iceberg Queries Efficiently , 1998, VLDB.