论文信息 - A novel, low-latency algorithm for multiple Group-By query optimization

A novel, low-latency algorithm for multiple Group-By query optimization

Data summarization is essential for users to interact with data. Current state of the art algorithms to optimize its most general form, the multiple Group By queries, have limitations in scalability. In this paper, we propose a novel algorithm, Top-Down Splitting, that scales to hundreds or even thousands of attributes and queries, and that quickly and efficiently produces optimized query execution plans. We analyze the complexity of our algorithm, and evaluate, empirically, its scalability and effectiveness through an experimental campaign. Results show that our algorithm is remarkably faster than alternatives in prior works, while generally producing better solutions. Ultimately, our algorithm reduces up to 34% the query execution time, when compared to un-optimized plans.

Pietro Michiardi | Duy-Hung Phan

[1] Raymond T. Ng,et al. Iceberg-cube computation with PC clusters , 2001, SIGMOD '01.

[2] Arian Baer. Two parallel approaches to network data analysis , 2011 .

[3] Andrew Rau-Chaplin,et al. Computing Partial Data Cubes , 2003 .

[4] Sanjay Ghemawat,et al. MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[5] Jeffrey F. Naughton,et al. On the Computation of Multidimensional Aggregates , 1996, VLDB.

[6] Raghu Ramakrishnan,et al. Bottom-up computation of sparse and Iceberg CUBE , 1999, SIGMOD '99.

[7] Hamid Pirahesh,et al. Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals , 1996, Data Mining and Knowledge Discovery.

[8] Zhimin Chen,et al. Efficient computation of multiple group by queries , 2005, SIGMOD '05.

[9] Kenneth A. Ross,et al. Adaptive Aggregation on Chip Multiprocessors , 2007, VLDB.

[10] Pietro Michiardi,et al. Efficient and Self-Balanced ROLLUP Aggregates for Large-Scale Data Summarization , 2015, 2015 IEEE International Congress on Big Data.

[11] Kenneth A. Ross,et al. Fast Computation of Sparse Datacubes , 1997, VLDB.