Complete yet practical search for minimal query reformulations under constraints

We revisit the Chase&Backchase (C&B) algorithm for query reformulation under constraints, which provides a uniform solution to such particular-case problems as view-based rewriting under constraints, semantic query optimization, and physical access path selection in query optimization. For an important class of queries and constraints, C&B has been shown to be complete, i.e. guaranteed to find all (join-)minimal reformulations under constraints. C&B is based on constructing a canonical rewriting candidate called a universal plan, then inspecting its exponentially many sub-queries in search for minimal reformulations, essentially removing redundant joins in all possible ways. This inspection involves chasing the subquery. Because of the resulting exponentially many chases, the conventional wisdom has held that completeness is a concept of mainly theoretical interest. We show that completeness can be preserved at practically relevant cost by introducing Prov-C&B, a novel reformulation algorithm that instruments the chase to maintain provenance information connecting the joins added during the chase to the universal plan subqueries responsible for adding these joins. This allows it to directly "read off" the minimal reformulations from the result of a single chase of the universal plan, saving exponentially many chases of its subqueries. We exhibit natural scenarios yielding speedups of over two orders of magnitude between the execution of the best view-based rewriting found by a commercial query optimizer and that of the best rewriting found by Prov-C&B (which the optimizer misses because of limited reasoning about constraints).

[1]  Amihai Motro,et al.  An access authorization model for relational databases based on algebraic manipulation of view definitions , 1989, [1989] Proceedings. Fifth International Conference on Data Engineering.

[2]  Werner Nutt,et al.  Rewriting queries with arbitrary aggregation functions using views , 2006, TODS.

[3]  Alin Deutsch,et al.  Reformulation of XML Queries and Constraints , 2003, ICDT.

[4]  Alin Deutsch,et al.  MARS: A System for Publishing XML from Mixed and Redundant Storage , 2003, VLDB.

[5]  Dan Suciu,et al.  Query-Based Data Pricing , 2015, J. ACM.

[6]  Val Tannen,et al.  Object/relational query optimization with chase and backchase , 2000 .

[7]  Divesh Srivastava,et al.  Answering Queries Using Views. , 1999, PODS 1995.

[8]  Alin Deutsch,et al.  Physical Data Independence, Constraints, and Optimization with Universal Plans , 1999, VLDB.

[9]  S. Sudarshan,et al.  Extending query rewriting techniques for fine-grained access control , 2004, SIGMOD '04.

[10]  Bertram Ludäscher,et al.  Processing first-order queries under limited access patterns , 2004, PODS '04.

[11]  Georg Lausen,et al.  On Chase Termination Beyond Stratification , 2009, Proc. VLDB Endow..

[12]  Val Tannen,et al.  Provenance semirings , 2007, PODS.

[13]  Ashok K. Chandra,et al.  Optimal implementation of conjunctive queries in relational data bases , 1977, STOC '77.

[14]  Alin Deutsch,et al.  A chase too far? , 2000, SIGMOD '00.

[15]  Cong Yu,et al.  Constraint-based XML query rewriting for data integration , 2004, SIGMOD '04.

[16]  Rada Chirkova,et al.  Finding Equivalent Rewritings in the Presence of Arithmetic Comparisons , 2006, EDBT.

[17]  V. Vianu,et al.  Edinburgh Why and Where: A Characterization of Data Provenance , 2017 .

[18]  Alin Deutsch,et al.  Rewriting nested XML queries using nested views , 2006, SIGMOD Conference.

[19]  Catriel Beeri,et al.  A Proof Procedure for Data Dependencies , 1984, JACM.

[20]  Ronald Fagin,et al.  Data exchange: semantics and query answering , 2003, Theor. Comput. Sci..

[21]  Alin Deutsch,et al.  Query reformulation with constraints , 2006, SGMD.

[22]  Alon Y. Halevy,et al.  Answering queries using views: A survey , 2001, The VLDB Journal.

[23]  Mohamed Ziauddin,et al.  Materialized Views in Oracle , 1998, VLDB.

[24]  Serge Abiteboul,et al.  Foundations of Databases , 1994 .

[25]  Chen Li,et al.  Answering queries using views with arithmetic comparisons , 2002, PODS '02.

[26]  Divesh Srivastava,et al.  Answering Queries with Aggregation Using Views , 1996, VLDB.

[27]  Hamid Pirahesh,et al.  Answering complex SQL queries using automatic summary tables , 2000, SIGMOD '00.

[28]  Jonathan Goldstein,et al.  MTCache: Mid-Tier Database Caching for SQL Server , 2004, IEEE Data Eng. Bull..

[29]  Ronald Fagin,et al.  Horn clauses and database dependencies , 1982, JACM.

[30]  Mark Levene,et al.  Why is the snowflake schema a good data warehouse design? , 2003, Inf. Syst..

[31]  Jeffrey D. Ullman,et al.  Answering Queries Using Limited External Query Processors , 1999, J. Comput. Syst. Sci..

[32]  David DeHaan,et al.  Equivalence of nested queries with mixed semantics , 2009, PODS.

[33]  Jonathan Goldstein,et al.  Optimizing queries using materialized views: a practical, scalable solution , 2001, SIGMOD '01.

[34]  Ioana Manolescu,et al.  Query optimization in the presence of limited access patterns , 1999, SIGMOD '99.

[35]  Divesh Srivastava,et al.  Answering Queries Using Views. , 1999, PODS 1995.