Towards compiling graph queries in relational engines

The increasing demand for graph query processing has prompted the addition of support for graph workloads on top of standard relational database management systems (RDBMS). Although this appears like a good idea --- after all, graphs are just relations --- performance is typically suboptimal since graph workloads are naturally iterative and rely extensively on efficient traversal of adjacency structures that are not typically implemented in an RDBMS. Adding such specialized adjacency structures is not at all straightforward due to the complexity of typical RDBMS implementations. The iterative nature of graph queries also practically requires a form of runtime compilation and native code generation which adds another dimension of complexity to the RDBMS implementation and any potential extensions. In this paper, we demonstrate how the idea of the first Futamura projection, which links interpreted query engines and compilers through specialization, can be applied to compile graph workloads in an efficient way that simplifies the construction of relational engines which also support graph workloads. We extend the LB2 main-memory query compiler with graph adjacency structures and operators. We implement a subset of the Datalog logical query language evaluation to enable processing graph and recursive queries efficiently. The graph extension matches, and sometimes outperforms, best-of-breed low-level graph engines.

[1]  Yuan Yu,et al.  Dryad: distributed data-parallel programs from sequential building blocks , 2007, EuroSys '07.

[2]  Kai Wang,et al.  RStream: Marrying Relational Algebra with Streaming for Efficient Graph Mining on A Single Machine , 2018, OSDI.

[3]  Eugene Sharygin,et al.  Runtime Specialization of PostgreSQL Query Executor , 2017, Ershov Informatics Conference.

[4]  Juan Sequeda,et al.  G-CORE: A Core for Future Graph Query Languages , 2017, SIGMOD Conference.

[5]  Emir Pasalic,et al.  Design and Implementation of the LogicBlox System , 2015, SIGMOD Conference.

[6]  Yoshihiko Futamura,et al.  Partial Evaluation of Computation Process--An Approach to a Compiler-Compiler , 1999, High. Order Symb. Comput..

[7]  Christoph Koch,et al.  Building Efficient Query Engines in a High-Level Language , 2014, TODS.

[8]  Saman P. Amarasinghe,et al.  A Common Runtime for High Performance Data Analysis , 2017, CIDR.

[9]  Torsten Grust,et al.  Precision Performance Surgery for PostgreSQL: LLVM-based Expression Compilation, Just in Time , 2016, Proc. VLDB Endow..

[10]  Guy E. Blelloch,et al.  Ligra: a lightweight graph processing framework for shared memory , 2013, PPoPP '13.

[11]  Alin Deutsch,et al.  Datalography: Scaling datalog graph analytics on graph processing systems , 2016, 2016 IEEE International Conference on Big Data (Big Data).

[12]  Sungpack Hong,et al.  PGQL: a property graph query language , 2016, GRADES '16.

[13]  Panos Kalnis,et al.  ScaleMine: Scalable Parallel Frequent Subgraph Mining in a Single Large Graph , 2016, SC16: International Conference for High Performance Computing, Networking, Storage and Analysis.

[14]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[15]  Monica S. Lam,et al.  SociaLite: Datalog extensions for efficient social network analysis , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[16]  Mohammed J. Zaki,et al.  Arabesque: a system for distributed graph mining , 2015, SOSP.

[17]  Martin Grund,et al.  Impala: A Modern, Open-Source SQL Engine for Hadoop , 2015, CIDR.

[18]  James Cheng,et al.  G-thinker: Big Graph Mining Made Easier and Faster , 2017, ArXiv.

[19]  Amir Shaikhha,et al.  How to Architect a Query Compiler , 2016, SIGMOD Conference.

[20]  Wolfgang Lehner,et al.  The Graph Story of the SAP HANA Database , 2013, BTW.

[21]  Irving L. Traiger,et al.  System R: relational approach to database management , 1976, TODS.

[22]  Kunle Olukotun,et al.  EmptyHeaded: A Relational Engine for Graph Processing , 2015, ACM Trans. Database Syst..

[23]  Wolfgang Lehner,et al.  GRAPHITE: an extensible graph traversal framework for relational database management systems , 2014, SSDBM.

[24]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[25]  Theodore L. Willke,et al.  GraphBuilder – A Scalable Graph Construction Library for Apache TM Hadoop TM , 2012 .

[26]  Letizia Tanca,et al.  What you Always Wanted to Know About Datalog (And Never Dared to Ask) , 1989, IEEE Trans. Knowl. Data Eng..

[27]  Joseph K. Bradley,et al.  Spark SQL: Relational Data Processing in Spark , 2015, SIGMOD Conference.

[28]  Gang Hu,et al.  SQLGraph: An Efficient Relational-Based Property Graph Store , 2015, SIGMOD Conference.

[29]  Willy Zwaenepoel,et al.  X-Stream: edge-centric graph processing using streaming partitions , 2013, SOSP.

[30]  Carsten Binnig,et al.  An Architecture for Compiling UDF-centric Workflows , 2015, Proc. VLDB Endow..

[31]  Thomas Heinis,et al.  Just-In-Time Data Virtualization: Lightweight Data Management with ViDa , 2015, CIDR.

[32]  Olivier Danvy,et al.  Partial evaluation: Principles and perspectives , 1993 .

[33]  Kunle Olukotun,et al.  Delite , 2014, ACM Trans. Embed. Comput. Syst..

[34]  Guy E. Blelloch,et al.  GraphChi: Large-Scale Graph Computation on Just a PC , 2012, OSDI.

[35]  Kunle Olukotun,et al.  Green-Marl: a DSL for easy and efficient graph analysis , 2012, ASPLOS XVII.

[36]  Stratis Viglas,et al.  Generating code for holistic query evaluation , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[37]  Martin Odersky,et al.  Lightweight modular staging: a pragmatic approach to runtime code generation and compiled DSLs , 2010, GPCE '10.

[38]  Yu Xiao,et al.  Large-Scale Graph Analytics in Aster 6: Bringing Context to Big Data Discovery , 2014, Proc. VLDB Endow..

[39]  Sherif Sakr,et al.  Hybrid query execution engine for large attributed graphs , 2014, Inf. Syst..

[40]  Wenguang Chen,et al.  GridGraph: Large-Scale Graph Processing on a Single Machine Using 2-Level Hierarchical Partitioning , 2015, USENIX ATC.

[41]  Pararth Shah,et al.  Ringo: Interactive Graph Analytics on Big-Memory Machines , 2015, SIGMOD Conference.

[42]  Jignesh M. Patel,et al.  The Case Against Specialized Graph Analytics Engines , 2015, CIDR.

[43]  Tiark Rompf,et al.  On supporting compilation in spatial query engines: (vision paper) , 2016, SIGSPATIAL/GIS.

[44]  Craig Freedman,et al.  Hekaton: SQL server's memory-optimized OLTP engine , 2013, SIGMOD '13.

[45]  Claudio Gutierrez,et al.  Survey of graph database models , 2008, CSUR.

[46]  Carlos Guestrin,et al.  Distributed GraphLab : A Framework for Machine Learning and Data Mining in the Cloud , 2012 .

[47]  Jure Leskovec,et al.  Statistical properties of community structure in large social and information networks , 2008, WWW.

[48]  Magdalena Balazinska,et al.  Asynchronous and Fault-Tolerant Recursive Datalog Evaluation in Shared-Nothing Engines , 2015, Proc. VLDB Endow..

[49]  Hamid Pirahesh,et al.  Compiled Query Execution Engine using JVM , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[50]  Nada Amin,et al.  Functional pearl: a SQL to C compiler in 500 lines of code , 2015, ICFP.

[51]  Udayan Khurana,et al.  GraphGen: Exploring Interesting Graphs in Relational Data , 2015, Proc. VLDB Endow..

[52]  Neil D. Jones,et al.  An introduction to partial evaluation , 1996, CSUR.

[53]  Anastasia Ailamaki,et al.  H2O: a hands-free adaptive store , 2014, SIGMOD Conference.

[54]  M. Abadi,et al.  Naiad: a timely dataflow system , 2013, SOSP.

[55]  Walid G. Aref,et al.  Extending In-Memory Relational Database Engines with Native Graph Support , 2018, EDBT.

[56]  Thomas Neumann,et al.  Efficiently Compiling Efficient Query Plans for Modern Hardware , 2011, Proc. VLDB Endow..

[57]  Pavel Berkhin,et al.  A Survey on PageRank Computing , 2005, Internet Math..

[58]  Justin Zhijun Zhan,et al.  Data mining in distributed environment: a survey , 2017, WIREs Data Mining Knowl. Discov..

[59]  Keshav Pingali,et al.  A lightweight infrastructure for graph analytics , 2013, SOSP.

[60]  Alexander Aiken,et al.  A Distributed Multi-GPU System for Fast Graph Processing , 2017, Proc. VLDB Endow..

[61]  Kunle Olukotun,et al.  Flare: Optimizing Apache Spark with Native Compilation for Scale-Up Architectures and Medium-Size Data , 2018, OSDI.

[62]  Philip S. Yu,et al.  SPADE: the system s declarative stream processing engine , 2008, SIGMOD Conference.

[63]  Samuel Madden,et al.  Voodoo - A Vector Algebra for Portable Database Performance on Modern Hardware , 2016, Proc. VLDB Endow..

[64]  Kunle Olukotun,et al.  Have abstraction and eat performance, too: Optimized heterogeneous computing with parallel patterns , 2016, 2016 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).

[65]  Christoph Koch,et al.  DBToaster: A SQL Compiler for High-Performance Delta Processing in Main-Memory Databases , 2009, Proc. VLDB Endow..

[66]  Todd L. Veldhuizen,et al.  Leapfrog Triejoin: A Simple, Worst-Case Optimal Join Algorithm , 2012, 1210.0481.

[67]  Rick Greer,et al.  Daytona and the fourth-generation language Cymbal , 1999, SIGMOD '99.

[68]  Yoshihiko Futamura Partial Evaluation of Computation Process, Revisited , 1999, High. Order Symb. Comput..

[69]  Carlo Zaniolo,et al.  Big Data Analytics with Datalog Queries on Spark , 2016, SIGMOD Conference.

[70]  Kunle Olukotun,et al.  OptiML: An Implicitly Parallel Domain-Specific Language for Machine Learning , 2011, ICML.

[71]  Tiark Rompf,et al.  How to Architect a Query Compiler, Revisited , 2018, SIGMOD Conference.