A Cost Estimation Technique for Recursive Relational Algebra

With the increasing popularity of data structures such as graphs, recursion is becoming a key ingredient of query languages in analytic systems. Recursive query evaluation involves an iterative application of a function or operation until some condition is satisfied. It is particularly useful for retrieving nodes reachable along deep paths in a graph. The optimization of recursive queries has remained a challenge for decades. Recently, extensions of Codd's classical relational algebra to support recursive terms and their optimisation gained renewed interest [10]. Query optimization crucially relies on enumeration of query evaluation plans and on cost estimation techniques. Cost estimation for recursive terms is far from trivial, and received less attention. In this paper, we propose a new cost estimation technique for recursive terms of the extended relational algebra. This technique allows to select an estimated cheapest query plan, in terms of computing resources usage e.g. memory footprint, CPU and I/O and evaluation time. We evaluate the effectiveness of our cost estimation technique on a set of recursive graph queries on both generated and real datasets of significant size, including Yago: a graph with more than 62 millions edges and 42 million nodes. Experiments show that our cost estimation technique improves the performance of recursive query evaluation on popular relational database engines such as PostgreSQL.

[1]  Gerhard Weikum,et al.  WWW 2007 / Track: Semantic Web Session: Ontologies ABSTRACT YAGO: A Core of Semantic Knowledge , 2022 .

[2]  Michael Stonebraker,et al.  The design of POSTGRES , 1986, SIGMOD '86.

[3]  StonebrakerMichael,et al.  The design of POSTGRES , 1986 .

[4]  Carlo Zaniolo,et al.  Big Data Analytics with Datalog Queries on Spark , 2016, SIGMOD Conference.

[5]  Daniel Lemire,et al.  Apache Calcite: A Foundational Framework for Optimized Query Processing Over Heterogeneous Data Sources , 2018, SIGMOD Conference.

[6]  Surajit Chaudhuri,et al.  An overview of query optimization in relational systems , 1998, PODS.

[7]  Stefano Ceri,et al.  An Overview of Parallel Strategies for Transitive Closure on Algebraic Machines , 1990, PRISMA Workshop.

[8]  George H. L. Fletcher,et al.  Querying Graphs , 2018, Querying Graphs.

[9]  Jeffrey F. Naughton,et al.  Selectivity and Cost Estimation for Joins Based on Random Sampling , 1996, J. Comput. Syst. Sci..

[10]  Patricia G. Selinger,et al.  Access path selection in a relational database management system , 1979, SIGMOD '79.

[11]  Patrick Valduriez,et al.  Optimization of object-oriented recursive queries using cost-controlled strategies , 1992, SIGMOD '92.

[12]  George H. L. Fletcher,et al.  gMark: Schema-Driven Generation of Graphs and Queries , 2015, IEEE Transactions on Knowledge and Data Engineering.

[13]  Volker Markl,et al.  Spinning Fast Iterative Data Flows , 2012, Proc. VLDB Endow..

[14]  Jeffrey D. Ullman,et al.  Map-reduce extensions and recursive queries , 2011, EDBT/ICDT '11.

[15]  Arun N. Swami,et al.  On the Estimation of Join Result Sizes , 1994, EDBT.

[16]  Viktor Leis,et al.  How Good Are Query Optimizers, Really? , 2015, Proc. VLDB Endow..

[17]  Nils Gesbert,et al.  On the Optimization of Recursive Relational Queries: Application to Graph Queries , 2020, SIGMOD Conference.