Comparing columnar, row and array DBMSs to process recursive queries on graphs

Abstract Analyzing graphs is a fundamental problem in big data analytics, for which DBMS technology does not seem competitive. On the other hand, SQL recursive queries are a fundamental mechanism to analyze graphs in a DBMS, whose processing and optimization are significantly harder than traditional SPJ queries. Columnar DBMSs are a new faster class of database system, with significantly different storage and query processing mechanisms compared to row DBMSs, still the dominating technology. With that motivation in mind, we study the optimization of recursive queries on a columnar DBMS focusing on two fundamental and complementary graph problems: transitive closure and adjacency matrix multiplication. From a query processing perspective we consider the three fundamental relational operators: selection, projection and join (SPJ), where projection subsumes SQL group-by aggregation. We present comprehensive experiments comparing recursive query processing on columnar, row and array DBMSs to analyze large graphs with different shape and density. We study the relative impact of query optimizations and we compare raw speed of DBMSs to evaluate recursive queries on graphs. Results confirm classical query optimizations that keep working well in a columnar DBMS, but their relative impact is different. Most importantly, a columnar DBMS with tuned query optimization is uniformly faster than row and array systems to analyze large graphs, regardless of their shape, density and connectivity. On the other hand, there is no clear winner between the row and array DBMSs.

[1]  Patrick Valduriez,et al.  Evaluation of Recursive Queries Using Join Indices , 1986, Expert Database Conf..

[2]  Serge Abiteboul,et al.  Foundations of Databases: The Logical Level , 1995 .

[3]  Moshe Y. Vardi Decidability and undecidability results for boundedness of linear recursive queries , 1988, PODS.

[4]  Ramakrishna Varadarajan,et al.  The Vertica Analytic Database: C-Store 7 Years Later , 2012, Proc. VLDB Endow..

[5]  Carlos Ordonez,et al.  Optimization of Linear Recursive Queries in SQL , 2010, IEEE Transactions on Knowledge and Data Engineering.

[6]  Michael Stonebraker,et al.  C-Store: A Column-oriented DBMS , 2005, VLDB.

[7]  Hamid Pirahesh,et al.  Implementation of magic-sets in a relational database system , 1994, SIGMOD '94.

[8]  LambAndrew,et al.  The vertica analytic database , 2012, VLDB 2012.

[9]  Goetz Graefe,et al.  Query evaluation techniques for large databases , 1993, CSUR.

[10]  Limsoon Wong,et al.  Incremental Recomputation of Recursive Queries with Nested Sets and Aggregate Functions , 1997, DBPL.

[11]  Michael Stonebraker,et al.  The implementation of POSTGRES , 2019, Making Databases Work.

[12]  Jeffrey F. Naughton,et al.  On the expected size of recursive Datalog queries , 1991, J. Comput. Syst. Sci..

[13]  Surajit Chaudhuri,et al.  An overview of query optimization in relational systems , 1998, PODS.

[14]  Jeffrey D. Ullman,et al.  Implementation of logical query languages for databases , 1985, TODS.

[15]  Martin L. Kersten,et al.  MonetDB: Two Decades of Research in Column-oriented Database Architectures , 2012, IEEE Data Eng. Bull..

[16]  Sherif Sakr,et al.  Hybrid query execution engine for large attributed graphs , 2014, Inf. Syst..

[17]  Wolfgang Lehner,et al.  SLACID - sparse linear algebra in a column-oriented in-memory database system , 2014, SSDBM '14.

[18]  Michael Stonebraker,et al.  SciDB: A Database Management System for Applications with Complex Analytics , 2013, Computing in Science & Engineering.

[19]  Divesh Srivastava,et al.  Implementation of the CORAL deductive database system , 1993, SIGMOD Conference.

[20]  Seppo Sippu,et al.  An analysis of magic sets and related optimization strategies for logic queries , 1996, JACM.

[21]  Rakesh Agrawal,et al.  Extending SQL with Generalized Transitive Closure Functionality , 1993, IEEE Trans. Knowl. Data Eng..

[22]  H. V. Jagadish,et al.  Direct transitive closure algorithms: design and performance evaluation , 1990, TODS.

[23]  Hamid Pirahesh,et al.  Magic conditions , 1996, TODS.

[24]  Norman May,et al.  SQLScript: Efficiently Analyzing Big Enterprise Data in SAP HANA , 2013, BTW.

[25]  Kemal Koymen,et al.  SQL*: a recursive SQL , 1993, Inf. Syst..

[26]  Norman May,et al.  The SAP HANA Database -- An Architecture Overview , 2012, IEEE Data Eng. Bull..

[27]  Wellington Cabrera,et al.  The Gamma Matrix to Summarize Dense and Sparse Data Sets for Big Data Analytics , 2016, IEEE Transactions on Knowledge and Data Engineering.

[28]  Raghu Ramakrishnan,et al.  Transitive closure algorithms based on graph traversal , 1993, TODS.

[29]  Carlos Ordonez,et al.  Statistical Model Computation with UDFs , 2010, IEEE Transactions on Knowledge and Data Engineering.

[30]  Carlos Ordonez Optimizing recursive queries in SQL , 2005, SIGMOD '05.

[31]  Carlos Ordonez,et al.  Recursive Query Evaluation in a Column DBMS to Analyze Large Graphs , 2014, DOLAP '14.

[32]  Lawrence J. Henschen,et al.  Classification and Compilation of Linear Recursive Queries in Deductive Databases , 1992, IEEE Trans. Knowl. Data Eng..