A Formal Semantics of SQL Queries, Its Validation, and Applications

While formal semantics of theoretical languages underlying SQL have been provided in the past, they all made simplifying assumptions ranging from changes in the syntax to omitting bag semantics and nulls. This situation is reminiscent of what happens in the field of programming languages, where semantics of formal calculi underlying the main features of languages are abundant, but formal semantics of real languages that people use are few and far between. We consider the basic class of SQL queries --- essentially SELECT-FROM-WHERE queries with subqueries, set/bag operations, and nulls --- and define a formal semantics for it, without any departures from the real language. This fragment already requires decisions related to the data model and handling variable names that are normally disregarded by simplified semantics. To justify our choice of the semantics, we validate it experimentally on a large number of randomly generated queries and databases. We give two applications of the semantics. One is the first formal proof of the equivalence of basic SQL and relational algebra that extends to bag semantics and nulls. The other application looks at the three-valued logic employed by SQL, which is universally assumed to be necessary to handle nulls. We prove however that this is not so, as three-valued logic does not add expressive power: every SQL query in our fragment can be evaluated under the usual two-valued Boolean semantics of conditions.

[1]  E. F. Codd,et al.  A data base sublanguage founded on the relational calculus , 1971, SIGFIDET '71.

[2]  John C. Mitchell,et al.  Concepts in programming languages , 2002 .

[3]  Robin Milner,et al.  Commentary on standard ML , 1990 .

[4]  Joseph Albert,et al.  Algebraic Properties of Bag Data Types , 1991, VLDB.

[5]  J. Gregory Morrisett,et al.  Toward a verified relational database management system , 2010, POPL '10.

[6]  Giuseppe Pelagatti,et al.  Formal semantics of SQL queries , 1991, TODS.

[7]  Michael Norrish C formalised in HOL , 1998 .

[8]  Leonid Libkin,et al.  Making SQL Queries Correct on Incomplete Databases: A Feasibility Study , 2016, PODS.

[9]  Marshall Swain,et al.  Topics in Philosophical Logic , 1971 .

[10]  Jennifer Widom,et al.  Database Systems: The Complete Book , 2001 .

[11]  Harold Abelson,et al.  Revised5 report on the algorithmic language scheme , 1998, SIGP.

[12]  C. J. Date A Guide to the SQL Standard , 1987 .

[13]  Jan Van den Bussche,et al.  Translating SQL into the Relational Algebra , 2009 .

[14]  Raghu Ramakrishnan,et al.  Database Management Systems , 1976 .

[15]  Limsoon Wong,et al.  Query Languages for Bags and Aggregate Functions , 1997, J. Comput. Syst. Sci..

[16]  Charles McEwen Ellison,et al.  A formal semantics of C with applications , 2012 .

[17]  Nikolaos Papaspyrou,et al.  A Formal Semantics for the C Programming Language , 2000 .

[18]  Yuri Gurevich,et al.  The Semantics of the C Programming Language , 1992, CSL.

[19]  Alvin Cheung,et al.  HoTTSQL: proving query rewrites with univalent SQL semantics , 2016, PLDI.

[20]  R. Kent Dybvig,et al.  Revised6 Report on the Algorithmic Language Scheme , 2009 .

[21]  Carl A. Gunter Semantics of programming languages: structures and techniques , 1993, Choice Reviews Online.

[22]  Serge Abiteboul,et al.  Foundations of Databases , 1994 .

[23]  Margus Veanes,et al.  Qex: Symbolic SQL Query Explorer , 2010, LPAR.

[24]  Robin Milner,et al.  Definition of standard ML , 1990 .

[25]  Kevin E. Kline,et al.  SQL in a Nutshell , 2000 .

[26]  Guido Moerkotte,et al.  Optimization and Evaluation of Disjunctive Queries , 2000, IEEE Trans. Knowl. Data Eng..

[27]  Tova Milo,et al.  Towards Tractable Algebras for Bags , 1996, J. Comput. Syst. Sci..

[28]  Robert Harper,et al.  Practical Foundations for Programming Languages , 2012 .

[29]  Georg Gottlob,et al.  Translating SQL Into Relational Algebra: Optimization, Semantics, and Equivalence of SQL Queries , 1985, IEEE Transactions on Software Engineering.

[30]  Alvin Cheung,et al.  Cosette: An Automated Prover for SQL , 2017, CIDR.

[31]  Grzegorz Malinowski,et al.  Many-valued logic and its philosophy , 2007, The Many Valued and Nonmonotonic Turn in Logic.

[32]  Leopoldo E. Bertossi,et al.  Consistency and trust in peer data exchange systems , 2016, Theory and Practice of Logic Programming.