Equivalence of SQL queries in presence of embedded dependencies

We consider the problem of finding equivalent minimal-size reformulations of SQL queries in presence of embedded dependencies [1]. Our focus is on select-project-join (SPJ) queries with equality comparisons, also known as safe conjunctive (CQ) queries, possibly with grouping and aggregation. For SPJ queries, the semantics of the SQL standard treats query answers as multisets (bags), whereas the stored relations are treated either as sets, which is called bag-set semantics, or as bags, which is called bag semantics. (Under set semantics, both query answers and stored relations are treated as sets.) In the context of the above Query-Reformulation Problem, we develop a comprehensive framework for equivalence of CQ queries under bag and bag-set semantics in presence of embedded dependencies, and make a number of conceptual and technical contributions. Specifically, we develop equivalence tests for CQ queries in presence of arbitrary sets of embedded dependencies under bag and bag-set semantics, under the condition that chase [10] under set semantics (set-chase) on the inputs terminates. We also present equivalence tests for CQ queries with grouping and aggregation in presence of embedded dependencies. We use our equivalence tests to develop sound and complete (whenever set-chase on the inputs terminates) algorithms for solving instances of the Query-Reformulation Problem with CQ queries under each of bag and bag-set semantics, as well as for instances of the problem with aggregate queries. Our contributions are clearly applicable beyond the Query-Reformulation Problem considered in this paper. Specifically, the results of this paper can be used in developing algorithms for rewriting CQ queries and queries in more expressive languages (e.g., including grouping and aggregation, or arithmetic comparisons) using views in presence of embedded dependencies, under bag or bag-set semantics for query evaluation.

[1]  Alin Deutsch,et al.  The chase revisited , 2008, PODS.

[2]  Kyuseok Shim,et al.  Optimizing queries with materialized views , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[3]  Werner Nutt,et al.  Rewriting aggregate queries using views , 1999, PODS.

[4]  Sara Cohen Containment of aggregate queries , 2005, SGMD.

[5]  Werner Nutt,et al.  Deciding equivalences among aggregate queries , 1998, PODS '98.

[6]  Surajit Chaudhuri,et al.  Optimization of real conjunctive queries , 1993, PODS '93.

[7]  Ashok K. Chandra,et al.  Optimal implementation of conjunctive queries in relational data bases , 1977, STOC '77.

[8]  Alin Deutsch,et al.  Query reformulation with constraints , 2006, SGMD.

[9]  Jennifer Widom,et al.  Database Systems: The Complete Book , 2001 .

[10]  Phokion G. Kolaitis,et al.  The containment problem for Real conjunctive queries with inequalities , 2006, PODS '06.

[11]  Chen Li,et al.  Rewriting Queries using Views , 2009, Encyclopedia of Database Systems.

[12]  Vasilis Vassalos,et al.  Answering Queries Using Views , 2009, Encyclopedia of Database Systems.

[13]  Alon Y. Halevy,et al.  Answering queries using views: A survey , 2001, The VLDB Journal.

[14]  Michael R. Genesereth,et al.  Answering recursive queries using views , 1997, PODS '97.

[15]  Serge Abiteboul,et al.  Foundations of Databases , 1994 .

[16]  Alin Deutsch,et al.  Xml query reformulation over mixed and redundant storage , 2002 .

[17]  Jeffrey D. Ullman,et al.  Information integration using logical views , 1997, Theor. Comput. Sci..

[18]  Anthony C. Klug On conjunctive queries containing inequalities , 1988, JACM.

[19]  Alin Deutsch,et al.  Reformulation of XML Queries and Constraints , 2003, ICDT.

[20]  Rada Chirkova,et al.  Query evaluation using overlapping views: completeness and efficiency , 2006, SIGMOD Conference.

[21]  Sara Cohen,et al.  Equivalence of queries combining set and bag-set semantics , 2006, PODS '06.

[22]  Divesh Srivastava,et al.  Answering Queries Using Views. , 1999, PODS 1995.

[23]  Ronald Fagin,et al.  Data exchange: semantics and query answering , 2003, Theor. Comput. Sci..