Optimizing Cursor Loops in Relational Databases

Loops that iterate over SQL query results are quite common, both in application programs that run outside the DBMS, as well as User Defined Functions (UDFs) and stored procedures that run within the DBMS. It can be argued that set-oriented operations are more efficient and should be preferred over iteration; but from real-world use cases, it is clear that loops over query results are inevitable in many situations, and are preferred by many users. Such loops, known as cursor loops, come with huge trade-offs and overheads w.r.t. performance, resource consumption and concurrency. We present Aggify, a technique for optimizing loops over query results that overcomes all these overheads. It achieves this by automatically generating custom aggregates that are equivalent in semantics to the loop. Thereby, Aggify completely eliminates the loop by rewriting the query to use this generated aggregate. This technique has several advantages such as: (i) pipelining of entire cursor loop operations instead of materialization, (ii) pushing down loop computation from the application layer into the DBMS, closer to the data, (iii) leveraging existing work on optimization of aggregate functions, resulting in efficient query plans. We describe the technique underlying Aggify and present our experimental evaluation over benchmarks as well as real workloads that demonstrate the significant benefits of this technique.

[1]  David J. DeWitt,et al.  A transformation-based approach to optimizing loops in database programming languages , 1992, SIGMOD '92.

[2]  FerranteJeanne,et al.  The program dependence graph and its use in optimization , 1987 .

[3]  S. Sudarshan,et al.  Cobra: A Framework for Cost-Based Rewriting of Database Applications , 2018, 2018 IEEE 34th International Conference on Data Engineering (ICDE).

[4]  Alvin Cheung,et al.  Automatic Partitioning of Database Applications , 2012, Proc. VLDB Endow..

[5]  Ken Kennedy,et al.  Optimizing Compilers for Modern Architectures: A Dependence-based Approach , 2001 .

[6]  Karthik Ramachandra,et al.  Aggify: Lifting the Curse of Cursor Loops using Custom Aggregates , 2020, SIGMOD Conference.

[7]  Joe D. Warren,et al.  The program dependence graph and its use in optimization , 1987, TOPL.

[8]  Kwanghyun Park,et al.  Froid: Optimization of Imperative Programs in a Relational Database , 2017, Proc. VLDB Endow..

[9]  Alvin Cheung,et al.  Optimizing database-backed applications with query synthesis , 2013, PLDI.

[10]  Xiaodong Zhang,et al.  SQLoop: High Performance Iterative Processing in Data Management , 2018, 2018 IEEE 38th International Conference on Distributed Computing Systems (ICDCS).

[11]  Per-Åke Larson,et al.  Eager Aggregation and Lazy Aggregation , 1995, VLDB.

[12]  Torsten Grust,et al.  Compiling PL/SQL Away , 2019, CIDR.

[13]  S. Sudarshan,et al.  Decorrelation of user defined function invocations in queries , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[14]  Steven S. Muchnick,et al.  Advanced Compiler Design and Implementation , 1997 .

[15]  S. Sudarshan,et al.  Rewriting procedures for batched bindings , 2008, Proc. VLDB Endow..

[16]  César A. Galindo-Legaria,et al.  Orthogonal optimization of subqueries and aggregation , 2001, SIGMOD '01.

[17]  Manu Sridharan,et al.  Translating imperative code to MapReduce , 2014, OOPSLA 2014.

[18]  Maaz Bin Safeer Ahmad,et al.  Automatically Leveraging MapReduce Frameworks for Data-Intensive Applications , 2018, SIGMOD Conference.

[19]  S. Sudarshan,et al.  Extracting Equivalent SQL from Imperative Code in Database Applications , 2016, SIGMOD Conference.