Axiomatic Foundations and Algorithms for Deciding Semantic Equivalences of SQL Queries

Deciding the equivalence of SQL queries is a fundamental problem in data management. As prior work has mainly focused on studying the theoretical limitations of the problem, very few implementations for checking such equivalences exist. In this paper, we present a new formalism and implementation for reasoning about the equivalences of SQL queries. Our formalism, U-semiring, extends SQL's semiring semantics with unbounded summation and duplicate elimination. U-semiring is defined using only very few axioms and can thus be easily implemented using proof assistants such as Coq for automated query reasoning. Yet, they are sufficient enough to enable us reason about sophisticated SQL queries that are evaluated over bags and sets, along with various integrity constraints. To evaluate the effectiveness of U-semiring, we have used it to formally verify 39 query rewrite rules from both classical data management research papers and real-world SQL engines, where many of them have never been proven correct before.

[1]  Alvin Cheung,et al.  Optimizing database-backed applications with query synthesis , 2013, PLDI.

[2]  E. F. Codd,et al.  A relational model of data for large shared data banks , 1970, CACM.

[3]  Akash Lal,et al.  Optimizing Big-Data Queries Using Program Synthesis , 2017, SOSP.

[4]  C. J. Date A Guide to the SQL Standard , 1987 .

[5]  Marvin H. Solomon,et al.  The GMAP: a versatile tool for physical data independence , 1996, The VLDB Journal.

[6]  Val Tannen,et al.  An Equational Chase for Path-Conjunctive Queries, Constraints, and Views , 1999, ICDT.

[7]  A. Mostowski Review: B. A. Trahtenbrot, Impossibility of an Algorithm for the Decision Problem in Finite Classes , 1950, Journal of Symbolic Logic.

[8]  Alin Deutsch,et al.  Chase & Backchase: A Method for Query Optimization With Materialized Views and Integrity Constraints , 2001 .

[9]  Irving L. Traiger,et al.  A history and evaluation of System R , 1981, CACM.

[10]  Michel Minoux,et al.  Graphs, dioids and semirings : new models and algorithms , 2008 .

[11]  Harry K. T. Wong,et al.  Optimization of nested SQL queries revisited , 1987, SIGMOD '87.

[12]  Boris Motik,et al.  Benchmarking the Chase , 2017, PODS.

[13]  Mihalis Yannakakis,et al.  Equivalences Among Relational Expressions with the Union and Difference Operators , 1980, J. ACM.

[14]  Connolly,et al.  Database Systems , 2004 .

[15]  Andrey Gubarev,et al.  Dremel : Interactive Analysis of Web-Scale Datasets , 2011 .

[16]  Val Tannen,et al.  Provenance semirings , 2007, PODS.

[17]  Greg Nelson,et al.  Fast Decision Procedures Based on Congruence Closure , 1980, JACM.

[18]  Margus Veanes,et al.  Symbolic Query Exploration , 2009, ICFEM.

[19]  Dan Suciu,et al.  The Myria Big Data Management and Analytics System and Cloud Services , 2017, CIDR.

[20]  Ronald Fagin,et al.  Data exchange: semantics and query answering , 2003, Theor. Comput. Sci..

[21]  Ashok K. Chandra,et al.  Optimal implementation of conjunctive queries in relational data bases , 1977, STOC '77.

[22]  Isil Dillig,et al.  Verifying equivalence of database-driven applications , 2017, Proc. ACM Program. Lang..

[23]  Goetz Graefe The Cascades Framework for Query Optimization , 1995, IEEE Data Eng. Bull..

[24]  Alin Deutsch,et al.  Physical Data Independence, Constraints, and Optimization with Universal Plans , 1999, VLDB.

[25]  Serge Abiteboul,et al.  Foundations of Databases , 1994 .

[26]  Cliff B. Jones,et al.  The tactic language , 1991 .

[27]  Yuri Gurevich,et al.  The Classical Decision Problem , 1997, Perspectives in Mathematical Logic.

[28]  C. J. Date A Guide to the SQL Standard, Second Edition , 1989 .

[29]  Joseph K. Bradley,et al.  Spark SQL: Relational Data Processing in Spark , 2015, SIGMOD Conference.

[30]  Alvin Cheung,et al.  Cosette: An Automated Prover for SQL , 2017, CIDR.

[31]  Noam Rinetzky,et al.  Verifying Equivalence of Spark Programs , 2017, CAV.

[32]  Alin Deutsch,et al.  A chase too far? , 2000, SIGMOD '00.

[33]  Jeremy Avigad,et al.  A metaprogramming framework for formal verification , 2017, Proc. ACM Program. Lang..

[34]  Werner Nutt,et al.  Rewriting aggregate queries using views , 1999, PODS.

[35]  Alvin Cheung,et al.  HoTTSQL: proving query rewrites with univalent SQL semantics , 2016, PLDI.

[36]  Hamid Pirahesh,et al.  Cost-based optimization for magic: algebra and implementation , 1996, SIGMOD '96.

[37]  Alvin Cheung,et al.  Demonstration of the Cosette Automated SQL Prover , 2017, SIGMOD Conference.

[38]  Margus Veanes,et al.  Qex: Symbolic SQL Query Explorer , 2010, LPAR.

[39]  Leonid Libkin,et al.  A Formal Semantics of SQL Queries, Its Validation, and Applications , 2017, Proc. VLDB Endow..

[40]  Phokion G. Kolaitis,et al.  The containment problem for Real conjunctive queries with inequalities , 2006, PODS '06.

[41]  Joshua S. Auerbach,et al.  Handling Environments in a Nested Relational Algebra with Combinators and an Implementation in a Verified Query Compiler , 2017, SIGMOD Conference.

[42]  Raghu Ramakrishnan,et al.  Containment of conjunctive queries: beyond relations as sets , 1995, TODS.

[43]  Dan Suciu,et al.  Demonstration of the Myria big data management service , 2014, SIGMOD Conference.

[44]  A. Wilkie THE CLASSICAL DECISION PROBLEM (Perspectives in Mathematical Logic) By Egon Börger, Erich Grädel and Yuri Gurevich: 482 pp., DM.158.–, ISBN 3 540 57073 X (Springer, 1997). , 1998 .

[45]  Surajit Chaudhuri,et al.  Optimization of real conjunctive queries , 1993, PODS '93.

[46]  Hamid Pirahesh,et al.  Extensible/rule based query rewrite optimization in Starburst , 1992, SIGMOD '92.

[47]  C. J. Date A guide to the SQL standard (2nd ed.) , 1989 .

[48]  Jeremy Avigad,et al.  The Lean Theorem Prover (System Description) , 2015, CADE.