On the Optimization of Recursive Relational Queries: Application to Graph Queries

Graph databases have received a lot of attention as they are particularly useful in many applications such as social networks, life sciences and the semantic web. Various languages have emerged to query graph databases, many of which embed forms of recursion which reveal essential for navigating in graphs. The relational model has benefited from a huge body of research in the last half century and that is why many graph databases rely on techniques of relational query engines. Since its introduction, the relational model has seen various attempts to extend it with recursion and it is now possible to use recursion in several SQL or Datalog based database systems. The optimization of recursive queries remains, however, a challenge. We propose mu-RA, a variation of the Relational Algebra equipped with a fixpoint operator for expressing recursive relational queries. mu-RA can notably express unions of conjunctive regular path queries. Leveraging the fact that this fixpoint operator makes recursive terms more amenable to algebraic transformations, we propose new rewrite rules. These rules makes it possible to generate new query execution plans, that cannot be obtained with previous approaches. We present the syntax and semantics of mu-RA, and the rewriting rules that we specifically devised to tackle the optimization of recursive queries. We report on practical experiments that show that the newly generated plans can provide significant performance improvements for evaluating recursive queries over graphs.

[1]  Goetz Graefe,et al.  Volcano - An Extensible and Parallel Query Evaluation System , 1994, IEEE Trans. Knowl. Data Eng..

[2]  Emir Pasalic,et al.  Design and Implementation of the LogicBlox System , 2015, SIGMOD Conference.

[3]  Jennifer Widom,et al.  The Lorel query language for semistructured data , 1997, International Journal on Digital Libraries.

[4]  Rakesh Agrawal Alpha: An extension of relational algebra to express a class of recursive queries , 1987, 1987 IEEE Third International Conference on Data Engineering.

[5]  Goetz Graefe The Cascades Framework for Query Optimization , 1995, IEEE Data Eng. Bull..

[6]  Alon Y. Halevy,et al.  Recursive Query Plans for Data Integration , 2000, J. Log. Program..

[7]  Manolis Gergatsoulis,et al.  Linearisability on datalog programs , 2003, Theor. Comput. Sci..

[8]  Jeffrey D. Uuman Principles of database and knowledge- base systems , 1989 .

[9]  Serge Abiteboul,et al.  Foundations of Databases: The Logical Level , 1995 .

[10]  Alfred V. Aho,et al.  Universality of data retrieval languages , 1979, POPL.

[11]  E. Tronci,et al.  1996 , 1997, Affair of the Heart.

[12]  Patricia G. Selinger,et al.  Access path selection in a relational database management system , 1979, SIGMOD '79.

[13]  Ashok K. Chandra Programming primitives for database languages , 1981, POPL '81.

[14]  Michael Stonebraker,et al.  The Implementation of Postgres , 1990, IEEE Trans. Knowl. Data Eng..

[15]  Domagoj Vrgoc,et al.  Querying Graphs with Data , 2016, J. ACM.

[16]  Gerhard Weikum,et al.  WWW 2007 / Track: Semantic Web Session: Ontologies ABSTRACT YAGO: A Core of Semantic Knowledge , 2022 .

[17]  Michael Schmidt,et al.  Foundations of SPARQL query optimization , 2008, ICDT '10.

[18]  Carlo Zaniolo,et al.  Big Data Analytics with Datalog Queries on Spark , 2016, SIGMOD Conference.

[19]  Jacopo Urbani,et al.  Column-Oriented Datalog Materialization for Large Knowledge Graphs , 2016, AAAI.

[20]  Dan Suciu,et al.  A query language and optimization techniques for unstructured data , 1996, SIGMOD '96.

[21]  Olaf Hartig,et al.  SPARQL with property paths on the Web , 2017, Semantic Web.

[22]  G. Jantzen 1988 , 1988, The Winning Cars of the Indianapolis 500.

[23]  Dexter Kozen,et al.  RESULTS ON THE PROPOSITIONAL’p-CALCULUS , 2001 .

[24]  Dan Suciu,et al.  Optimizing regular path expressions using graph schemas , 1998, Proceedings 14th International Conference on Data Engineering.

[25]  Kyungbaek Kim,et al.  Estimating the Evaluation Cost of Regular Path Queries on Large Graphs , 2017, SoICT.

[26]  Michael Kifer,et al.  On compile-time query optimization in deductive databases by means of static filtering , 1990, TODS.

[27]  Jacopo Urbani,et al.  VLog: A Column-Oriented Datalog System for Large Knowledge Graphs , 2016, International Semantic Web Conference.

[28]  Mark H. Chignell,et al.  TASWEET: Optimizing Disjunctive Path Queries in Graph Databases , 2017, EDBT.

[29]  Jarek Gryz,et al.  WAVEGUIDE: Evaluating SPARQL Property Path Queries , 2015, EDBT.

[30]  Jarek Gryz,et al.  Evaluation of SPARQL Property Paths via Recursive SQL , 2013, AMW.

[31]  Jérôme Euzenat,et al.  Constrained regular expressions for answering RDF-path queries modulo RDFS , 2014, Int. J. Web Inf. Syst..

[32]  Alberto O. Mendelzon,et al.  A graphical query language supporting recursion , 1987, SIGMOD '87.

[33]  Orri Erling,et al.  Virtuoso, a Hybrid RDBMS/Graph Column Store , 2012, IEEE Data Eng. Bull..

[34]  Yanhong A. Liu,et al.  More efficient datalog queries: subsumptive tabling beats magic sets , 2011, SIGMOD '11.

[35]  Stefan Plantikow,et al.  Cypher: An Evolving Query Language for Property Graphs , 2018, SIGMOD Conference.

[36]  Srikanta J. Bedathur,et al.  Sparqling kleene: fast property paths in RDF-3X , 2013, GRADES.

[37]  Pramodita Sharma 2012 , 2013, Les 25 ans de l’OMC: Une rétrospective en photos.

[38]  F. E. A Relational Model of Data Large Shared Data Banks , 2000 .

[39]  Wim Martens,et al.  An Analytical Study of Large SPARQL Query Logs , 2017, Proc. VLDB Endow..

[40]  Donald Kossmann,et al.  Iterative dynamic programming: a new class of query optimization algorithms , 2000, TODS.

[41]  David Maier,et al.  Magic sets and other strange ways to implement logic programs (extended abstract) , 1985, PODS '86.

[42]  George H. L. Fletcher,et al.  gMark: Schema-Driven Generation of Graphs and Queries , 2015, IEEE Transactions on Knowledge and Data Engineering.

[43]  Carlo Zaniolo,et al.  On the implementation of a simple class of logic queries for databases , 1985, PODS.

[44]  Rakesh Agrawal Alpha: An Extension of Relational Algebra to Express a Class of Recursive Queries , 1988, IEEE Trans. Software Eng..

[45]  Alberto O. Mendelzon,et al.  Finding Regular Simple Paths in Graph Databases , 1989, SIAM J. Comput..

[46]  Serge Abiteboul,et al.  Datalog Extensions for Database Queries and Updates , 1991, J. Comput. Syst. Sci..

[47]  Wolfgang Faber,et al.  The DLV system for knowledge representation and reasoning , 2002, TOCL.

[48]  Goetz Graefe,et al.  The Volcano optimizer generator: extensibility and efficient search , 1993, Proceedings of IEEE 9th International Conference on Data Engineering.

[49]  Myra Spiliopoulou,et al.  Genetic programming in database query optimization , 1996 .

[50]  Georges Gardarin,et al.  Evaluation of database recursive logic programs as recurrent function series , 1986, SIGMOD '86.

[51]  Alberto O. Mendelzon,et al.  Database techniques for the World-Wide Web: a survey , 1998, SGMD.

[52]  Pablo Barceló,et al.  Graph Logics with Rational Relations and the Generalized Intersection Problem , 2012, 2012 27th Annual IEEE Symposium on Logic in Computer Science.

[53]  Stefan Plantikow,et al.  openCypher: New Directions in Property Graph Querying , 2018, EDBT.

[54]  Alberto O. Mendelzon,et al.  GraphLog: a visual formalism for real life recursion , 1990, PODS '90.

[55]  Marcelo Arenas,et al.  Counting beyond a Yottabyte, or how SPARQL 1.1 property paths will prevent adoption of the standard , 2012, WWW.

[56]  SuciuDan,et al.  A query language and optimization techniques for unstructured data , 1996 .

[57]  Jeffrey F. Naughton,et al.  Efficient evaluation of right-, left-, and multi-linear rules , 1989, SIGMOD '89.

[58]  Jeffrey F. Naughton,et al.  Selectivity and Cost Estimation for Joins Based on Random Sampling , 1996, J. Comput. Syst. Sci..

[59]  Peter M. G. Apers,et al.  Algebraic optimization of recursive queries , 1992, Data Knowl. Eng..

[60]  Jarek Gryz,et al.  Query Planning for Evaluating SPARQL Property Paths , 2016, AMW.

[61]  Carlos A. Hurtado,et al.  Edinburgh Research Explorer Expressive Languages for Path Queries over Graph-Structured Data , 2012 .

[62]  Juan Sequeda,et al.  G-CORE: A Core for Future Graph Query Languages , 2017, SIGMOD Conference.

[63]  Volker Markl,et al.  Representations and Optimizations for Embedded Parallel Dataflow Languages , 2019, ACM Trans. Database Syst..

[64]  Erik Meijer,et al.  A co-Relational Model of Data for Large Shared Data Banks , 2011, ECOOP.