Graph Processing in RDBMSs

To support analytics on massive graphs such as online social networks, RDF, Semantic Web, etc. many new graph algorithms are designed to query graphs for a specific problem, and many distributed graph processing systems are developed to support graph querying by programming. A main issue to be addressed is how RDBMS can support graph processing. And the first thing is how RDBMS can support graph processing at the SQL level. Our work is motivated by the fact that there are many relations stored in RDBMS that are closely related to a graph in real applications and need to be used together to query the graph, and RDBMS is a system that can query and manage data while data may be updated over time. To support graph processing, we propose 4 new relational algebra operations, MM-join, MV-join, anti-join, and union-by-update. Here, MM-join and MV-join are join operations between two matrices and between a matrix and a vector, respectively, followed by aggregation computing over groups, given a matrix/vector can be represented by a relation. Both deal with the semiring by which many graph algorithms can be supported. The anti-join removes nodes/edges in a graph when they are unnecessary for the following computing. The union-by-update addresses value updates to compute PageRank, for example. The 4 new relational algebra operations can be defined by the 6 basic relational algebra operations with group-by & aggregation. We revisit SQL recursive queries and show that the 4 operations with others are ensured to have a fixpoint, following the techniques studied in DATALOG, and we enhance the recursive with clause in SQL’99. RDBMSs are capable of dealing with graph processing in reasonable time.

[1]  Srinivasan Parthasarathy,et al.  A Framework for SQL-Based Mining of Large Graphs on Relational Databases , 2010, PAKDD.

[2]  Samuel Madden,et al.  GRAPHiQL: A graph intuitive query language for relational databases , 2014, 2014 IEEE International Conference on Big Data (Big Data).

[3]  Kunle Olukotun,et al.  EmptyHeaded: A Relational Engine for Graph Processing , 2015, ACM Trans. Database Syst..

[4]  S. Sudarshan,et al.  Keyword searching and browsing in databases using BANKS , 2002, Proceedings 18th International Conference on Data Engineering.

[5]  Carlos Ordonez,et al.  Optimization of Linear Recursive Queries in SQL , 2010, IEEE Transactions on Knowledge and Data Engineering.

[6]  Sungpack Hong,et al.  PGQL: a property graph query language , 2016, GRADES '16.

[7]  Sergio Greco,et al.  Datalog and Logic Databases , 2015, Synthesis Lectures on Data Management.

[8]  Krzysztof Stencel,et al.  Recursive Query Facilities in Relational Databases: A Survey , 2010, FGIT-DTA/BSBT.

[9]  Dawit Yimam Seid,et al.  Adaptive optimizations of recursive queries in teradata , 2012, SIGMOD Conference.

[10]  Bharat Bhargava,et al.  Advanced Database Systems , 1993, Lecture Notes in Computer Science.

[11]  Xin-She Yang,et al.  Introduction to Algorithms , 2021, Nature-Inspired Optimization Algorithms.

[12]  Neoklis Polyzotis,et al.  Scaling Datalog for Machine Learning on Big Data , 2012, ArXiv.

[13]  Chang Zhou,et al.  GLog: A high level graph analysis system using MapReduce , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[14]  Christos Faloutsos,et al.  PEGASUS: mining peta-scale graphs , 2011, Knowledge and Information Systems.

[15]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[16]  Kun Li,et al.  The MADlib Analytics Library or MAD Skills, the SQL , 2012, Proc. VLDB Endow..

[17]  Carlo Zaniolo,et al.  Optimizing recursive queries with monotonic aggregates in DeALS , 2015, 2015 IEEE 31st International Conference on Data Engineering.

[18]  Udayan Khurana,et al.  GraphGen: Exploring Interesting Graphs in Relational Data , 2015, Proc. VLDB Endow..

[19]  A. B. Kahn,et al.  Topological sorting of large networks , 1962, CACM.

[20]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[21]  Jeffrey Xu Yu,et al.  Relational Approach for Shortest Path Discovery over Large Graphs , 2011, Proc. VLDB Endow..

[22]  Carlo Zaniolo,et al.  Graph Queries in a Next-Generation Datalog System , 2013, Proc. VLDB Endow..

[23]  Monica S. Lam,et al.  SociaLite: Datalog extensions for efficient social network analysis , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[24]  Yannis Papakonstantinou,et al.  Fast In-Memory SQL Analytics on Typed Graphs , 2016, Proc. VLDB Endow..

[25]  Carlo Zaniolo,et al.  Negation and Aggregates in Recursive Rules: the LDL++ Approach , 1993, DOOD.

[26]  Ashwin Machanavajjhala,et al.  Finding connected components in map-reduce in logarithmic rounds , 2012, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[27]  Haixun Wang,et al.  G-SQL: Fast Query Processing via Graph Exploration , 2016, Proc. VLDB Endow..

[28]  Jennifer Widom,et al.  HelP: High-level Primitives For Large-Scale Graph Processing , 2014, GRADES.

[29]  Serge Abiteboul,et al.  Foundations of Databases , 1994 .

[30]  Wellington Cabrera,et al.  Unified Algorithm to Solve Several Graph Problems with Relational Queries , 2016, AMW.

[31]  Yves Métivier,et al.  An optimal bit complexity randomized distributed MIS algorithm , 2011, Distributed Computing.

[32]  Leland L. Beck,et al.  Smallest-last ordering and clustering and graph coloring algorithms , 1983, JACM.

[33]  Tinkara Toš,et al.  Graph Algorithms in the Language of Linear Algebra , 2012, Software, environments, tools.

[34]  S. Dongen Graph clustering by flow simulation , 2000 .

[35]  Jennifer Widom,et al.  SimRank: a measure of structural-context similarity , 2002, KDD.

[36]  Stephan Günnemann,et al.  SQL- and Operator-centric Data Analytics in Relational Main-Memory Databases , 2017, EDBT.

[37]  David Hardcastle,et al.  Using Pregel-like Large Scale Graph Processing Frameworks for Social Network Analysis , 2012, 2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining.

[38]  Monica S. Lam,et al.  Distributed SociaLite: A Datalog-Based Language for Large-Scale Graph Analysis , 2013, Proc. VLDB Endow..

[39]  Ana Paula Appel,et al.  HADI: Mining Radii of Large Graphs , 2011, TKDD.

[40]  Joseph M. Hellerstein,et al.  MAD Skills: New Analysis Practices for Big Data , 2009, Proc. VLDB Endow..

[41]  Peter T. Wood,et al.  Query languages for graph databases , 2012, SGMD.

[42]  Réka Albert,et al.  Near linear time algorithm to detect community structures in large-scale networks. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.

[43]  Alin Deutsch,et al.  Datalography: Scaling datalog graph analytics on graph processing systems , 2016, 2016 IEEE International Conference on Big Data (Big Data).

[44]  Wellington Cabrera,et al.  Comparing columnar, row and array DBMSs to process recursive queries on graphs , 2017, Inf. Syst..

[45]  Alan R. Simon,et al.  Sql: 1999 Understanding Relational Language Components , 2002 .

[46]  Jeffrey Xu Yu,et al.  All-in-One: Graph Processing in RDBMSs Revisited , 2017, SIGMOD Conference.

[47]  Ambuj K. Singh,et al.  Graphs-at-a-time: query language and access methods for graph databases , 2008, SIGMOD Conference.

[48]  Julia Stoyanovich,et al.  Towards a Distributed Infrastructure for Evolving Graph Analytics , 2016, WWW.

[49]  Fernando Sáenz-Pérez,et al.  Formalizing a Broader Recursion Coverage in SQL , 2013, PADL.

[50]  Jennifer Widom,et al.  Database Systems: The Complete Book , 2001 .

[51]  Carlo Zaniolo,et al.  Big Data Analytics with Datalog Queries on Spark , 2016, SIGMOD Conference.

[52]  Sergio Greco,et al.  Querying Graph Databases , 2000, EDBT.

[53]  Thomas A. Henzinger,et al.  Computing simulations on finite and infinite graphs , 1995, Proceedings of IEEE 36th Annual Foundations of Computer Science.

[54]  Samuel Madden,et al.  Graph analytics using vertica relational database , 2014 .

[55]  Robert Preis,et al.  Linear Time 1/2-Approximation Algorithm for Maximum Weighted Matching in General Graphs , 1999, STACS.

[56]  Ying Zhang,et al.  SciQL: array data processing inside an RDBMS , 2013, SIGMOD '13.

[57]  Gang Hu,et al.  SQLGraph: An Efficient Relational-Based Property Graph Store , 2015, SIGMOD Conference.

[58]  Carlo Zaniolo,et al.  The deductive database system [Lscr ][Dscr ][Lscr ]++ , 2002, Theory and Practice of Logic Programming.

[59]  Tim Weninger,et al.  Thinking Like a Vertex , 2015, ACM Comput. Surv..

[60]  Domagoj Vrgoc,et al.  Querying Graphs with Data , 2016, J. ACM.

[61]  Jeffrey D. Ullman,et al.  A survey of deductive database systems , 1995, J. Log. Program..

[62]  Jignesh M. Patel,et al.  The Case Against Specialized Graph Analytics Engines , 2015, CIDR.