Empowering In-Memory Relational Database Engines with Native Graph Processing

The plethora of graphs and relational data give rise to many interesting graph-relational queries in various domains, e.g., finding related proteins satisfying relational predicates in a biological network. The maturity of RDBMSs motivated academia and industry to invest efforts in leveraging RDBMSs for graph processing, where efficiency is proven for vital graph queries. However, none of these efforts process graphs natively inside the RDBMS, which is particularly challenging due to the impedance mismatch between the relational and the graph models. In this paper, we propose to treat graphs as first-class citizens inside the relational engine so that operations on graphs are executed natively inside the RDBMS. We realize our approach inside VoltDB, an open-source in-memory relational database, and name this realization GRFusion. The SQL and the query engine of GRFusion are empowered to declaratively define graphs and execute cross-data-model query plans formed by graph and relational operators, resulting in up to four orders-of-magnitude in query-time speedup w.r.t. state-of-the-art approaches.

[1]  Ruiwen Chen,et al.  Managing massive graphs in relational DBMS , 2013, 2013 IEEE International Conference on Big Data.

[2]  Haibo Chen,et al.  Scaling Multicore Databases via Constrained Parallel Execution , 2016, SIGMOD Conference.

[3]  Pararth Shah,et al.  Ringo: Interactive Graph Analytics on Big-Memory Machines , 2015, SIGMOD Conference.

[4]  Udayan Khurana,et al.  GraphGen: Exploring Interesting Graphs in Relational Data , 2015, Proc. VLDB Endow..

[5]  Michael Stonebraker,et al.  H-store: a high-performance, distributed main memory transaction processing system , 2008, Proc. VLDB Endow..

[6]  Kunle Olukotun,et al.  EmptyHeaded: A Relational Engine for Graph Processing , 2015, ACM Trans. Database Syst..

[7]  Wolfgang Lehner,et al.  GRAPHITE: an extensible graph traversal framework for relational database management systems , 2014, SSDBM.

[8]  Jeffrey Xu Yu,et al.  Relational Approach for Shortest Path Discovery over Large Graphs , 2011, Proc. VLDB Endow..

[9]  Goetz Graefe,et al.  Query evaluation techniques for large databases , 1993, CSUR.

[10]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[11]  Yin Yang,et al.  Elastic Pipelining in an In-Memory Database Cluster , 2016, SIGMOD Conference.

[12]  Christian S. Jensen,et al.  Effective caching of shortest paths for location-based services , 2012, SIGMOD Conference.

[13]  Yu Xiao,et al.  Large-Scale Graph Analytics in Aster 6: Bringing Context to Big Data Discovery , 2014, Proc. VLDB Endow..

[14]  Craig Freedman,et al.  Hekaton: SQL server's memory-optimized OLTP engine , 2013, SIGMOD '13.

[15]  Wolfgang Lehner,et al.  SLACID - sparse linear algebra in a column-oriented in-memory database system , 2014, SSDBM '14.

[16]  Sameh Elnikety,et al.  Horton: Online Query Execution Engine for Large Distributed Graphs , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[17]  Hans-Arno Jacobsen,et al.  A Hybrid B+-tree as Solution for In-Memory Indexing on CPU-GPU Heterogeneous Computing Platforms , 2016, SIGMOD Conference.

[18]  H. V. Jagadish,et al.  Optimization of generalized transitive closure queries , 1991, [1991] Proceedings. Seventh International Conference on Data Engineering.

[19]  Amol Deshpande,et al.  Extracting and Analyzing Hidden Graphs from Relational Databases , 2017, SIGMOD Conference.

[20]  Reynold Xin,et al.  GraphFrames: an integrated API for mixing graph and relational queries , 2016, GRADES '16.

[21]  Latha S. Colby A recursive algebra and query optimization for nested relations , 1989, SIGMOD '89.

[22]  Gang Hu,et al.  SQLGraph: An Efficient Relational-Based Property Graph Store , 2015, SIGMOD Conference.

[23]  Shan Shan Huang,et al.  Datalog and Recursive Query Processing , 2013, Found. Trends Databases.

[24]  Christos Faloutsos,et al.  DOULION: counting triangles in massive graphs with a coin , 2009, KDD.

[25]  Dawit Yimam Seid,et al.  Adaptive optimizations of recursive queries in teradata , 2012, SIGMOD Conference.

[26]  Rakesh Agrawal Alpha: An extension of relational algebra to express a class of recursive queries , 1987, 1987 IEEE Third International Conference on Data Engineering.

[27]  Holger Fröning,et al.  MEMSCALE: in-cluster-memory databases , 2011, CIKM '11.

[28]  Gang Chen,et al.  Adaptive Logging: Optimizing Logging and Recovery Costs in Distributed In-memory Databases , 2016, SIGMOD Conference.

[29]  Samuel Madden,et al.  Graph analytics using vertica relational database , 2014 .

[30]  Jignesh M. Patel,et al.  The Case Against Specialized Graph Analytics Engines , 2015, CIDR.

[31]  Sameh Elnikety,et al.  Horton+: A Distributed System for Processing Declarative Reachability Queries over Partitioned Graphs , 2013, Proc. VLDB Endow..

[32]  Giuliano Casale,et al.  Contention-Aware Workload Placement for In-Memory Databases in Cloud Environments , 2016, ACM Trans. Model. Perform. Evaluation Comput. Syst..

[33]  Jayant Madhavan,et al.  Consistent thinning of large geographical data for map visualization , 2013, TODS.

[34]  Edsger W. Dijkstra,et al.  A note on two problems in connexion with graphs , 1959, Numerische Mathematik.

[35]  Ulf Leser,et al.  Alternative routing: k-shortest paths with limited overlap , 2015, SIGSPATIAL/GIS.