Query Rewriting Based on Meta-Granular Aggregation

Analytic database queries are exceptionally time consuming. Decision support systems employ various execution techniques in order to accelerate such queries and reduce their resource consumption. Probably the most important of them consists in materialization of partial results. However, any introduction of derived objects into the database schema increases the cost of software development, since programmers must take care of their usage and synchronization. In this article we consider using partial aggregations materialized in additional tables. The idea is based on the concept of metagranules that represent the information on grouping and used aggregations. Metagranules have a natural partial order that guides the optimisation process. We present solutions to two problems. Firstly, we assume that a set of stored metagranules is given and we optimize a query. We present a novel query rewriting method to make analytic queries use the information stored in metagranules. We also describe our proof-of-concept implementation of this method and perform an extensive experimental evaluation using databases of the size up to 0:5 TiB and 6 billions rows. Secondly, we assume that a database workload is given and we want to select the optimal set of metagranules to materialize. Although each metagranule accelerates some queries, it also imposes a significant overhead on updates. Therefore, we propose a cost model that includes both benefits for queries and penalties for updates. We experiment with the complete search in the space of sets of metagranules to find the optimum. Finally, we empirically verify identified optimal sets against database instances up to 0:5 TiB with billions of rows and hundreds millions of aggregated rows.

[1]  Hamid Pirahesh,et al.  Robust query processing through progressive optimization , 2004, SIGMOD '04.

[2]  Aleksandra Boniewicz,et al.  Automatic Selection of Functional Indexes for Object Relational Mapping System , 2013 .

[3]  Piotr Synak,et al.  Brighthouse: an analytic data warehouse for ad-hoc queries , 2008, Proc. VLDB Endow..

[4]  Martin L. Kersten,et al.  Database Architecture Evolution: Mammals Flourished long before Dinosaurs became Extinct , 2009, Proc. VLDB Endow..

[5]  Surajit Chaudhuri,et al.  Table of Contents (pdf) , 2007, VLDB.

[6]  Luis Gravano,et al.  STHoles: a multidimensional workload-aware histogram , 2001, SIGMOD '01.

[7]  Piotr Synak,et al.  Two Database Related Interpretations of Rough Approximations: Data Organization and Query Execution , 2013, Fundam. Informaticae.

[8]  Philip A. Bernstein,et al.  Compiling mappings to bridge applications and databases , 2007, SIGMOD '07.

[9]  Daniel Kuhn,et al.  SQPR: Stream query planning with reuse , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[10]  David J. DeWitt,et al.  Efficient mid-query re-optimization of sub-optimal query execution plans , 1998, SIGMOD '98.

[11]  Krzysztof Stencel,et al.  Efficient Implementation of Recursive Queries in Major Object Relational Mapping Systems , 2011, FGIT.

[12]  Inderpal Singh Mumick,et al.  Maintenance of data cubes and summary tables in a warehouse , 1997, SIGMOD '97.

[13]  Piotr Synak,et al.  A Rough-Columnar RDBMS Engine -- A Case Study of Correlated Subqueries , 2012, IEEE Data Eng. Bull..

[14]  Alon Y. Halevy,et al.  Adapting to source properties in processing data integration queries , 2004, SIGMOD '04.

[15]  Bruce G. Lindsay,et al.  How to roll a join: asynchronous incremental view maintenance , 2000, SIGMOD '00.

[16]  Piotr Wisniewski,et al.  Partial Aggregation Using Hibernate , 2011, FGIT.

[17]  Elizabeth J. O'Neil,et al.  Object/relational mapping 2008: hibernate and the entity data model (edm) , 2008, SIGMOD Conference.

[18]  Yannis E. Ioannidis,et al.  The History of Histograms (abridged) , 2003, VLDB.