Distributed Evaluation of Graph Queries using Recursive Relational Algebra

We present a system called Dist-μ-RA for the distributed evaluation of recursive graph queries. Dist-μ-RA builds on the recursive relational algebra and extends it with evaluation plans suited for the distributed setting. The goal is to offer expressivity for high-level queries while providing efficiency at scale and reducing communication costs. Experimental results on both real and synthetic graphs show the effectiveness of the proposed approach compared to existing systems.

[1]  Pierre Genevès,et al.  A Cost Estimation Technique for Recursive Relational Algebra , 2020, CIKM.

[2]  Nils Gesbert,et al.  On the Optimization of Recursive Relational Queries: Application to Graph Queries , 2020, SIGMOD Conference.

[3]  Xiaofei Wang,et al.  Distributed Pregel-based provenance-aware regular path query processing on RDF knowledge graphs , 2019, World Wide Web.

[4]  Carlo Zaniolo,et al.  RaSQL: Greater Power and Performance for Big Data Analytics with Recursive-aggregate-SQL on Spark , 2019, SIGMOD Conference.

[5]  Joy Arulraj,et al.  Apache Giraph , 2018, Encyclopedia of Social Network Analysis and Mining. 2nd Ed..

[6]  Polyvios Pratikakis,et al.  Execution of Recursive Queries in Apache Spark , 2017, Euro-Par.

[7]  Mark H. Chignell,et al.  TASWEET: Optimizing Disjunctive Path Queries in Graph Databases , 2017, EDBT.

[8]  George H. L. Fletcher,et al.  gMark: Schema-Driven Generation of Graphs and Queries , 2015, IEEE Transactions on Knowledge and Data Engineering.

[9]  Carlo Zaniolo,et al.  Big Data Analytics with Datalog Queries on Spark , 2016, SIGMOD Conference.

[10]  Magdalena Balazinska,et al.  Asynchronous and Fault-Tolerant Recursive Datalog Evaluation in Shared-Nothing Engines , 2015, Proc. VLDB Endow..

[11]  Joseph K. Bradley,et al.  Spark SQL: Relational Data Processing in Spark , 2015, SIGMOD Conference.

[12]  The Uniprot Consortium,et al.  UniProt: a hub for protein information , 2014, Nucleic Acids Res..

[13]  Seif Haridi,et al.  Apache Flink™: Stream and Batch Processing in a Single Engine , 2015, IEEE Data Eng. Bull..

[14]  Jarek Gryz,et al.  WAVEGUIDE: Evaluating SPARQL Property Path Queries , 2015, EDBT.

[15]  Domagoj Vrgoc,et al.  Querying Graphs with Data , 2016, J. ACM.

[16]  Reynold Xin,et al.  GraphX: Graph Processing in a Distributed Dataflow Framework , 2014, OSDI.

[17]  Monica S. Lam,et al.  Distributed SociaLite: A Datalog-Based Language for Large-Scale Graph Analysis , 2013, Proc. VLDB Endow..

[18]  Srikanta J. Bedathur,et al.  Sparqling kleene: fast property paths in RDF-3X , 2013, GRADES.

[19]  Joseph Gonzalez,et al.  PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs , 2012, OSDI.

[20]  Pablo Barceló,et al.  Graph Logics with Rational Relations and the Generalized Intersection Problem , 2012, 2012 27th Annual IEEE Symposium on Logic in Computer Science.

[21]  Carlos Guestrin,et al.  Distributed GraphLab : A Framework for Machine Learning and Data Mining in the Cloud , 2012 .

[22]  Yanhong A. Liu,et al.  More efficient datalog queries: subsumptive tabling beats magic sets , 2011, SIGMOD '11.

[23]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[24]  Carlos A. Hurtado,et al.  Edinburgh Research Explorer Expressive Languages for Path Queries over Graph-Structured Data , 2012 .

[25]  Michael Isard,et al.  DryadLINQ: A System for General-Purpose Distributed Data-Parallel Computing Using a High-Level Language , 2008, OSDI.

[26]  Ravi Kumar,et al.  Pig latin: a not-so-foreign language for data processing , 2008, SIGMOD Conference.

[27]  Yuan Yu,et al.  Dryad: distributed data-parallel programs from sequential building blocks , 2007, EuroSys '07.

[28]  Serge Abiteboul,et al.  Foundations of Databases: The Logical Level , 1995 .

[29]  Georg Lausen,et al.  Parallelizing Datalog programs by generalized pivoting , 1991, PODS '91.

[30]  Alberto O. Mendelzon,et al.  GraphLog: a visual formalism for real life recursion , 1990, PODS '90.

[31]  Jeffrey F. Naughton,et al.  Efficient evaluation of right-, left-, and multi-linear rules , 1989, SIGMOD '89.

[32]  Rakesh Agrawal Alpha: An extension of relational algebra to express a class of recursive queries , 1987, 1987 IEEE Third International Conference on Data Engineering.

[33]  Yannis E. Ioannidis,et al.  On the Computation of the Transitive Closure of Relational Operators , 1986, VLDB.

[34]  David Maier,et al.  Magic sets and other strange ways to implement logic programs (extended abstract) , 1985, PODS '86.

[35]  Carlo Zaniolo,et al.  On the implementation of a simple class of logic queries for databases , 1985, PODS.

[36]  Alfred V. Aho,et al.  Universality of data retrieval languages , 1979, POPL.