On the equivalence and rewriting of aggregate queries

Abstract.We introduce a first-order language with real polynomial arithmetic and aggregation operators (count, iterated sum and multiply), which is well suited for the definition of aggregate queries involving complex statistical functions. It offers a good trade-off between expressive power and complexity, with a tractable data complexity. Interestingly, some fundamental properties of first-order with real arithmetic are preserved in the presence of aggregates. In particular, there is an effective quantifier elimination for formulae with aggregation. We then consider the problem of querying data that has already been aggregated in aggregate views, and focus on queries with an aggregation over a conjunctive query (namely single-block aggregate group-by queries without having clause). Our main conceptual contribution is the introduction of a new equivalence relation among conjunctive queries, the isomorphism modulo a product. We prove that the equivalence of aggregate queries such as for instance averages reduces to it. Deciding if two queries are isomorphic modulo a product is shown to be NP-complete. We then analyze the equivalence problem in the case of aggregate conjunctive queries with comparisons. We introduce the concept of cross isomorphic linear expansions, which generalizes isomorphim modulo a product, and we show that equivalence reduces to it and that it can be decided in PSPACE. Finally, we show that the problem of complete rewriting of count queries using count views is NP-complete, and we introduce new rewriting techniques based on the isomorphism modulo a product. to recover the values of counts by complex arithmetical computation from the views.

[1]  Nick Roussopoulos,et al.  DynaMat: a dynamic view management system for data warehouses , 1999, SIGMOD '99.

[2]  Arie Shoshani,et al.  OLAP and statistical databases: similarities and differences , 1997, PODS '97.

[3]  Divesh Srivastava,et al.  Data model and query evaluation in global information systems , 1995, Journal of Intelligent Information Systems.

[4]  Divesh Srivastava,et al.  Answering Queries with Aggregation Using Views , 1996, VLDB.

[5]  Stéphane Grumbach,et al.  Automatic aggregation using explicit metadata , 2000, Proceedings. 12th International Conference on Scientific and Statistica Database Management.

[6]  Limsoon Wong,et al.  On the Power of Aggregation in Relational Query Languages , 1997, DBPL.

[7]  Gultekin Özsoyoglu,et al.  Extending relational algebra and relational calculus with set-valued attributes and aggregate functions , 1987, TODS.

[8]  Luca Cabibbo,et al.  A Framework for the Investigation of Aggregate Functions in Database Queries , 1999, ICDT.

[9]  James Renegar,et al.  On the Computational Complexity and Geometry of the First-Order Theory of the Reals, Part III: Quantifier Elimination , 1992, J. Symb. Comput..

[10]  Chee-Keng Yap,et al.  Algebraic cell decomposition in NC , 1985, 26th Annual Symposium on Foundations of Computer Science (sfcs 1985).

[11]  J. Renegar,et al.  On the Computational Complexity and Geometry of the First-Order Theory of the Reals, Part I , 1989 .

[12]  Hamid Pirahesh,et al.  Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals , 1996, Data Mining and Knowledge Discovery.

[13]  Stéphane Grumbach,et al.  On the content of materialized aggregate views , 2000, PODS '00.

[14]  Gabriel M. Kuper,et al.  Constraint Databases , 2010, Springer Berlin Heidelberg.

[15]  Diego Calvanese,et al.  Query processing using views for regular path queries with inverse , 2000, PODS 2000.

[16]  Alon Y. Halevy,et al.  MiniCon: A scalable algorithm for answering queries using views , 2000, The VLDB Journal.

[17]  Dan Gusfield,et al.  A Graph Theoretic Approach to Statistical Data Security , 1988, SIAM J. Comput..

[18]  Anthony C. Klug On conjunctive queries containing inequalities , 1988, JACM.

[19]  Werner Nutt,et al.  Algorithms for Rewriting Aggregate Queries Using Views , 2000, DMDW.

[20]  Xiaolei Qian,et al.  Query folding , 1996, Proceedings of the Twelfth International Conference on Data Engineering.

[21]  Maurizio Rafanelli,et al.  Mefisto: A Functional Model for Statistical Entities , 1993, IEEE Trans. Knowl. Data Eng..

[22]  Arie Shoshani,et al.  Statistical and Scientific Database Issues , 1985, IEEE Transactions on Software Engineering.

[23]  Diego Calvanese,et al.  Lossless regular views , 2002, PODS.

[24]  Sunita Sarawagi,et al.  Modeling multidimensional databases , 1997, Proceedings 13th International Conference on Data Engineering.

[25]  Jeffrey D. Ullman,et al.  Implementing data cubes efficiently , 1996, SIGMOD '96.

[26]  Surajit Chaudhuri,et al.  Optimization of real conjunctive queries , 1993, PODS '93.

[27]  Arie Shoshani,et al.  Summarizability in OLAP and statistical data bases , 1997, Proceedings. Ninth International Conference on Scientific and Statistical Database Management (Cat. No.97TB100150).

[28]  Gabriel M. Kuper,et al.  Constraint Query Languages , 1995, J. Comput. Syst. Sci..

[29]  Divesh Srivastava,et al.  Answering Queries Using Views. , 1999, PODS 1995.

[30]  Per-Åke Larson,et al.  Query Transformation for PSJ-Queries , 1987, VLDB.

[31]  M. Kendall Probability and Statistical Inference , 1956, Nature.

[32]  Chen Li,et al.  Answering queries using views with arithmetic comparisons , 2002, PODS '02.

[33]  Chen Li,et al.  Minimizing View Sets without Losing Query-Answering Power , 2001, ICDT.

[34]  Ashok K. Chandra,et al.  Optimal implementation of conjunctive queries in relational data bases , 1977, STOC '77.

[35]  Ron van der Meyden The Complexity of Querying Indefinite Data about Linearly Ordered Domains , 1997, J. Comput. Syst. Sci..

[36]  Alon Y. Halevy,et al.  Answering queries using views: A survey , 2001, The VLDB Journal.

[37]  Kyuseok Shim,et al.  Optimizing queries with materialized views , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[38]  Sakti P. Ghosh Statistical relational tables for statistical database management , 1986, IEEE Transactions on Software Engineering.

[39]  Serge Abiteboul,et al.  Complexity of answering queries using materialized views , 1998, PODS.

[40]  Maurizio Rafanelli,et al.  The aggregate data problem: a system for their definition and management , 1996, SGMD.

[41]  Laks V. S. Lakshmanan,et al.  A Foundation for Multi-dimensional Databases , 1997, VLDB.

[42]  Ashish Gupta,et al.  Aggregate-Query Processing in Data Warehousing Environments , 1995, VLDB.

[43]  下山 武司 Cylindrical Algebraic Decomposition と実代数制約(数式処理における理論とその応用の研究) , 1995 .

[44]  Serge Abiteboul,et al.  Foundations of Databases , 1994 .

[45]  Oscar H. Ibarra,et al.  On the containment and equivalence of database queries with linear constraints (extended abstract) , 1997, PODS '97.

[46]  Werner Nutt,et al.  Rewriting aggregate queries using views , 1999, PODS.

[47]  Werner Nutt,et al.  Deciding equivalences among aggregate queries , 1998, PODS '98.

[48]  Alon Y. Halevy,et al.  Reasoning with Aggregation Constraints , 1996, EDBT.

[49]  Werner Nutt,et al.  Equivalences among aggregate queries with negation , 2005, TOCL.

[50]  James Renegar,et al.  On the Computational Complexity and Geometry of the First-Order Theory of the Reals, Part I: Introduction. Preliminaries. The Geometry of Semi-Algebraic Sets. The Decision Problem for the Existential Theory of the Reals , 1992, J. Symb. Comput..

[51]  Kenneth A. Ross,et al.  Foundations of Aggregation Constraints , 1994, Theor. Comput. Sci..

[52]  A. Prasad Sistla,et al.  View maintenance in mobile computing , 1995, SGMD.

[53]  Marina Moscarini,et al.  Computational issues connected with the protection of sensitive statistics by auditing sum-queries , 1998, Proceedings. Tenth International Conference on Scientific and Statistical Database Management (Cat. No.98TB100243).

[54]  Michael Benedikt,et al.  On the structure of queries in constraint query languages , 1996, Proceedings 11th Annual IEEE Symposium on Logic in Computer Science.

[55]  Anand Rajaraman,et al.  Answering queries using templates with binding patterns (extended abstract) , 1995, PODS.

[56]  Maurizio Rafanelli,et al.  Querying aggregate data , 1999, PODS '99.

[57]  Limsoon Wong,et al.  Query languages for bags: expressive power and complexity , 1996, SIGA.

[58]  Maurizio Rafanelli,et al.  Suppressing marginal cells to protect sensitive information in a two-dimensional statistical table (extended abstract) , 1991, PODS.

[59]  Tomasz Imielinski,et al.  Sleepers and workaholics: caching strategies in mobile environments , 1994, SIGMOD '94.

[60]  Inderpal Singh Mumick,et al.  Selection of Views to Materialize Under a Maintenance Cost Constraint , 1999, ICDT.

[61]  Lauri Hella,et al.  Logics with aggregate operators , 2001, JACM.

[62]  A. Macintyre,et al.  The Elementary Theory of Restricted Analytic Fields with Exponentiation , 1994 .

[63]  Alon Y. Halevy,et al.  Recursive Query Plans for Data Integration , 2000, J. Log. Program..

[64]  P. J. Green,et al.  Probability and Statistical Inference , 1978 .

[65]  Jonathan Goldstein,et al.  Optimizing queries using materialized views: a practical, scalable solution , 2001, SIGMOD '01.

[66]  George E. Collins,et al.  Cylindrical Algebraic Decomposition I: The Basic Algorithm , 1984, SIAM J. Comput..