The Case Against Specialized Graph Analytics Engines

Graph analytic processing has started to become a nearly ubiquitous component in the enterprise data analytics ecosystem. In response to this growing need, various specialized graph processing engines have been created in recent years. Sadly, the use of relational database management systems (RDBMSs) for graph processing is largely ignored in most enterprise settings. This oversight is surprising since in most enterprise settings, RDBMSs are already present and used for a variety of other analytic tasks. This situation then begs the question of whether the use of RDBMS for graph processing is fundamentally lacking in some respect compared to the specialized graph processing engines. In this paper, we aim to address this question both from the programmer productivity perspective and from the performance perspective. We present Grail { a syntactic layer for querying graph in a vertex-centric way in an RDBMS, which can be compiled to translate graph queries to SQL. In a single node setting, we also compare Grail to GraphLab and Giraph, and examine the performance implications of using Grail, showing that the RDBMS engine is competitive to these specialized engines. Given that RDBMSs are ubiquitous in enterprise settings, and have a robust and mature technology that has been hardened over decades, and are part of existing administrative methods in place, we argue that it is time to reconsider if specialized graph engines have a role to play in most enterprises.

[1]  Michael Stonebraker,et al.  VERTEXICA: Your Relational Friend for Graph Analytics! , 2014, Proc. VLDB Endow..

[2]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[3]  Jennifer Widom,et al.  GPS: a graph processing system , 2013, SSDBM.

[4]  David J. DeWitt,et al.  Relational Databases for Querying XML Documents: Limitations and Opportunities , 1999, VLDB.

[5]  Joseph Gonzalez,et al.  PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs , 2012, OSDI.

[6]  M. Tamer Özsu,et al.  An Experimental Comparison of Pregel-like Graph Processing Systems , 2014, Proc. VLDB Endow..

[7]  Joseph E. Gonzalez,et al.  GraphLab: A New Parallel Framework for Machine Learning , 2010 .

[8]  Marco Rosa,et al.  Layered label propagation: a multiresolution coordinate-free ordering for compressing social networks , 2010, WWW.

[9]  Jignesh M. Patel,et al.  Enabling JSON Document Stores in Relational Systems , 2013, WebDB.

[10]  Sreenivas Gollapudi,et al.  Of hammers and nails: an empirical comparison of three paradigms for processing large graphs , 2012, WSDM '12.

[11]  Carlo Zaniolo,et al.  Graph Queries in a Next-Generation Datalog System , 2013, Proc. VLDB Endow..

[12]  Monica S. Lam,et al.  Distributed SociaLite: A Datalog-Based Language for Large-Scale Graph Analysis , 2013, Proc. VLDB Endow..

[13]  Leslie G. Valiant,et al.  A bridging model for parallel computation , 1990, CACM.

[14]  Neoklis Polyzotis,et al.  Scaling Datalog for Machine Learning on Big Data , 2012, ArXiv.

[15]  Sebastiano Vigna,et al.  UbiCrawler: a scalable fully distributed Web crawler , 2004, Softw. Pract. Exp..

[16]  Nicolas Bruno Teaching an Old Elephant New Tricks , 2009, CIDR.

[17]  Yu Xiao,et al.  Large-Scale Graph Analytics in Aster 6: Bringing Context to Big Data Discovery , 2014, Proc. VLDB Endow..

[18]  Sebastiano Vigna,et al.  The webgraph framework I: compression techniques , 2004, WWW '04.