An Empirical Study on Recent Graph Database Systems

Graphs are widely used to model the intricate relationships among objects in a wide range of applications. The advance in graph data has brought significant value to artificial intelligence technologies. Recently, a number of graph database systems have been developed. In this paper, we present a comprehensive overview and empirical investigation on existing property graph database systems such as Neo4j, AgensGraph, TigerGraph and LightGraph (LightGraph has recently renamed to TuGraph.). These systems support declarative graph query languages. Our empirical studies are conducted in a single-machine environment against on the LDBC social network benchmark, consisting of three different large-scale datasets and a set of benchmark queries. This is the first empirical study to compare these graph database systems by evaluating data bulk importing and processing simple and complex queries. Experimental results provide insightful observations of various graph data systems and indicate that AgensGraph works well on SQL based workload and simple update queries, TigerGraph is powerful on complex business intelligence queries, Neo4j is user-friendly and suitable for small queries, and LightGraph is a more balanced product achieving good performance on different queries. The related code, scripts and data of this paper are available online (https://github.com/UNSW-database/GraphDB-Benchmark).

[1]  Hassan Chafi,et al.  The LDBC Social Network Benchmark: Interactive Workload , 2015, SIGMOD Conference.

[2]  Yannis Velegrakis,et al.  Beyond Macrobenchmarks: Microbenchmark-based Graph Database Evaluation , 2018, Proc. VLDB Endow..

[3]  Reynold Xin,et al.  GraphX: Graph Processing in a Distributed Dataflow Framework , 2014, OSDI.

[4]  Amine Mhedhbi,et al.  The Ubiquity of Large Graphs and Surprising Challenges of Graph Processing , 2017 .

[5]  Irena Holubová,et al.  Experimental Comparison of Graph Databases , 2013, IIWAS '13.

[6]  Yixin Cao,et al.  Explainable Reasoning over Knowledge Graphs for Recommendation , 2018, AAAI.

[7]  Jeremy Chen,et al.  Graphflow: An Active Graph Database , 2017, SIGMOD Conference.

[8]  Peter A. Boncz,et al.  An early look at the LDBC social network benchmark's business intelligence workload , 2018, GRADES/NDA@SIGMOD/PODS.

[9]  Xuemin Lin,et al.  PatMat: A Distributed Pattern Matching Engine with Cypher , 2019, CIKM.

[10]  Xiaoyong Du,et al.  Which Category Is Better: Benchmarking the RDBMSs and GDBMSs , 2019, APWeb/WAIM.

[11]  Danai Koutra,et al.  Graph based anomaly detection and description: a survey , 2014, Data Mining and Knowledge Discovery.

[12]  Panos Kalnis,et al.  A Survey and Experimental Comparison of Distributed SPARQL Engines for Very Large RDF Data , 2017, Proc. VLDB Endow..

[13]  Joseph M. Hellerstein,et al.  Distributed GraphLab: A Framework for Machine Learning in the Cloud , 2012, Proc. VLDB Endow..

[14]  Florin Rusu,et al.  In-Depth Benchmarking of Graph Database Systems with the Linked Data Benchmark Council (LDBC) Social Network Benchmark (SNB) , 2019, ArXiv.