Data oriented applications, usually written in a high-level, general-purpose programming language (such as Java) interact with database through a coarse interface. Informally, the text of a query is built on the application side (either via plain string concatenation or through an abstract notion of statement) and shipped to the database over the wire where it is executed. The results are then serialized and sent back to the "client-code" where they are translated in the language's native datatypes. This round trip is detrimental to performances but, worse, such a programming model prevents one from having richer queries, namely queries containing user-defined functions (that is functions defined by the programmer and used e.g. in the filter condition of a SQL query). While some databases also possess a "server-side" language (e.g. PL/SQL in Oracle database), its integration with the very-optimized query execution engine is still minimal and queries containing (PL/SQL) user-defined functions remain notoriously inefficient. In this setting, we reviewed existing language-integrated query frameworks, highlighting that existing database query languages (including SQL) share high-level querying primitives (e.g., filtering, joins, aggregation) that can be represented by operators, but differ widely regarding the semantics of their expression language. In order to represent queries in an application language- and database-agnostic manner, we designed a small calculus, dubbed "QIR" for Query Intermediate Representation. QIR contains expressions, corresponding to a small extension of the pure lambda-calculus, and operators to represent usual querying primitives. In the effort to send efficient queries to the database, we abstracted the idea of "good" query representations in a measure on QIR terms. Then, we designed an evaluation strategy rewriting QIR query representations into "better" ones.
[1]
Brian Beckman,et al.
LINQ: reconciling object, relations and XML in the .NET framework
,
2006,
SIGMOD Conference.
[2]
Henk Barendregt,et al.
The Lambda Calculus: Its Syntax and Semantics
,
1985
.
[3]
Alejandro Ríos,et al.
A standardisation proof for algebraic pattern calculi
,
2010,
HOR.
[4]
Zheng Shao,et al.
Hive - a petabyte scale data warehouse using Hadoop
,
2010,
2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).
[5]
James Cheney,et al.
Edinburgh Research Explorer A Practical Theory of Language-integrated Query
,
2022
.
[6]
Holistic Data Access Optimization for Analytics Reports
,
2014
.
[7]
Vincent van Oostrom,et al.
Lambda calculus with patterns
,
2008,
Theor. Comput. Sci..
[8]
Yannis Papakonstantinou,et al.
Declarative Ajax Web Applications through SQL++ on a Unified Application State
,
2013,
DBPL.
[9]
Prashant Malik,et al.
Cassandra: a decentralized structured storage system
,
2010,
OPSR.
[10]
Ralf Hinze,et al.
Implementation and Application of Functional Languages
,
2012,
Lecture Notes in Computer Science.
[11]
Peter Buneman,et al.
Types and persistence in database programming languages
,
1987,
CSUR.
[12]
Yannis Papakonstantinou,et al.
The SQL++ Semi-structured Data Model and Query Language: A Capabilities Survey of SQL-on-Hadoop, NoSQL and NewSQL Databases
,
2014,
ArXiv.
[13]
Alvin Cheung,et al.
Using Program Analysis to Improve Database Applications
,
2014,
IEEE Data Eng. Bull..
[14]
James Cheney,et al.
Language-integrated query using comprehension syntax: state of the art, open problems, and work in progress
,
2014
.