LSQB: a large-scale subgraph query benchmark

We introduce LSQB, a new large-scale subgraph query benchmark. LSQB tests the performance of database management systems on an important class of subgraph queries overlooked by existing benchmarks. Matching a labelled structural graph pattern, referred to as subgraph matching, is the focus of LSQB. In relational terms, the benchmark tests DBMSs' join performance as a choke-point since subgraph matching is equivalent to multi-way joins between base Vertex and base Edge tables on ID attributes. The benchmark focuses on read-heavy workloads by relying on global queries which have been ignored by prior benchmarks. Global queries, also referred to as unseeded queries, are a type of queries that are only constrained by labels on the query vertices and edges. LSQB contains a total of nine queries and leverages the LDBC social network data generator for scalability. The benchmark gained both academic and industrial interest and is used internally by 5+ different vendors.

[1]  Marko A. Rodriguez,et al.  Constructions from Dots and Lines , 2010, ArXiv.

[2]  W. L. Ngai,et al.  The LDBC Graphalytics Benchmark , 2020, ArXiv.

[3]  Semih Salihoglu,et al.  Distributed Evaluation of Subgraph Queries Using Worst-case Optimal and Low-Memory Dataflows , 2018, Proc. VLDB Endow..

[4]  Kihong Kim,et al.  Towards multi-way join aware optimizer in SAP HANA , 2020, Proc. VLDB Endow..

[5]  Amine Mhedhbi,et al.  The ubiquity of large graphs and surprising challenges of graph processing: extended survey , 2017, The VLDB Journal.

[6]  Atri Rudra,et al.  Skew strikes back: new developments in the theory of join algorithms , 2013, SGMD.

[7]  Dan Olteanu,et al.  Factorized Databases , 2016, SGMD.

[8]  Alexandru Iosup,et al.  A Survey of Benchmarks for Graph-Processing Systems , 2018, Graph Data Management.

[9]  Yuanyuan Tian,et al.  IBM Db2 Graph: Supporting Synergistic and Retrofittable Graph Queries Inside IBM Db2 , 2020, SIGMOD Conference.

[10]  Atanas Kiryakov,et al.  Benchmarking RDF Query Engines: The LDBC Semantic Publishing Benchmark , 2016, BLINK@ISWC.

[11]  Peter A. Boncz,et al.  An early look at the LDBC social network benchmark's business intelligence workload , 2018, GRADES/NDA@SIGMOD/PODS.

[12]  Gábor Szárnyas,et al.  An analysis of the SIGMOD 2014 Programming Contest: Complex queries on the LDBC social network graph , 2020, ArXiv.

[13]  Jeremy Chen,et al.  Graphflow: An Active Graph Database , 2017, SIGMOD Conference.

[14]  Amine Mhedhbi,et al.  Integrating Column-Oriented Storage and Query Processing Techniques Into Graph Database Management Systems , 2021, ArXiv.

[15]  Atri Rudra,et al.  Join Processing for Graph Patterns: An Old Dog with New Tricks , 2015, GRADES@SIGMOD/PODS.

[16]  Victor Lee,et al.  TigerGraph: A Native MPP Graph Database , 2019, ArXiv.

[17]  Jim Webber,et al.  A programmatic introduction to Neo4j , 2018, SPLASH '12.

[18]  Alfons Kemper,et al.  Adopting worst-case optimal joins in relational database systems , 2020, Proc. VLDB Endow..

[19]  Thomas Neumann,et al.  Umbra: A Disk-Based System with In-Memory Performance , 2020, CIDR.

[20]  Peter A. Boncz,et al.  S3G2: A Scalable Structure-Correlated Social Graph Generator , 2012, TPCTC.

[21]  Adam Silberstein,et al.  Benchmarking cloud serving systems with YCSB , 2010, SoCC '10.

[22]  Kunle Olukotun,et al.  EmptyHeaded: A Relational Engine for Graph Processing , 2015, ACM Trans. Database Syst..

[23]  Alexandru Iosup,et al.  The future is big graphs , 2020, Commun. ACM.

[24]  M. Tamer Özsu,et al.  Diversified Stress Testing of RDF Data Management Systems , 2014, SEMWEB.

[25]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[26]  Stefan Plantikow,et al.  Cypher: An Evolving Query Language for Property Graphs , 2018, SIGMOD Conference.

[27]  Zhengping Qian,et al.  Distributed Subgraph Matching on Timely Dataflow , 2019, Proc. VLDB Endow..

[28]  Wolfgang Lehner,et al.  Simplicity Done Right for Join Ordering , 2021, CIDR.

[29]  Octavian Udrea,et al.  Apples and oranges: a comparison of RDF benchmarks and real RDF datasets , 2011, SIGMOD '11.

[30]  Jeong-Hoon Lee,et al.  An In-depth Comparison of Subgraph Isomorphism Algorithms in Graph Databases , 2012, Proc. VLDB Endow..

[31]  Yannis Velegrakis,et al.  Beyond Macrobenchmarks: Microbenchmark-based Graph Database Evaluation , 2018, Proc. VLDB Endow..

[32]  Zhe Wu,et al.  PGX.ISO: Parallel and Efficient In-Memory Engine for Subgraph Isomorphism , 2014, GRADES.

[33]  Hassan Chafi,et al.  The LDBC Social Network Benchmark: Interactive Workload , 2015, SIGMOD Conference.

[34]  Marcelo Arenas,et al.  Foundations of Modern Query Languages for Graph Databases , 2016, ACM Comput. Surv..

[35]  Lijun Chang,et al.  Efficient Subgraph Matching by Postponing Cartesian Products , 2016, SIGMOD Conference.

[36]  Wolfgang Lehner,et al.  The Graph Story of the SAP HANA Database , 2013, BTW.

[37]  Alfons Kemper,et al.  HyPer: A hybrid OLTP&OLAP main memory database system based on virtual memory snapshots , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[38]  Zhengping Qian,et al.  Real-time Constrained Cycle Detection in Large Dynamic Graphs , 2018, Proc. VLDB Endow..

[39]  Tassilo Horn,et al.  The TTC 2014 Movie Database Case , 2014, TTC@STAF.

[40]  Jimmy J. Lin,et al.  Real-Time Twitter Recommendation: Online Motif Detection in Large Dynamic Graphs , 2014, Proc. VLDB Endow..

[41]  Viktor Leis,et al.  How Good Are Query Optimizers, Really? , 2015, Proc. VLDB Endow..

[42]  Michael Stonebraker,et al.  How I Learned to Stop Worrying and Love Re-optimization , 2019, 2019 IEEE 35th International Conference on Data Engineering (ICDE).

[43]  Shixuan Sun,et al.  RapidMatch: A Holistic Approach to Subgraph Query Processing , 2020, Proc. VLDB Endow..

[44]  Dániel Varró,et al.  The Train Benchmark: cross-technology performance evaluation of continuous model queries , 2017, Software & Systems Modeling.

[45]  Timothy G. Armstrong,et al.  LinkBench: a database benchmark based on the Facebook social graph , 2013, SIGMOD '13.

[46]  Stefan Rümmele,et al.  Benchmarking Database Systems for Graph Pattern Matching , 2014, DEXA.

[47]  Thomas Neumann,et al.  TPC-H Analyzed: Hidden Messages and Lessons Learned from an Influential Benchmark , 2013, TPCTC.

[48]  Shixuan Sun,et al.  In-Memory Subgraph Matching: An In-depth Study , 2020, SIGMOD Conference.

[49]  Amine Mhedhbi,et al.  Optimizing Subgraph Queries by Combining Binary and Worst-Case Optimal Joins , 2019, Proc. VLDB Endow..

[50]  Arnau Prat-Pérez,et al.  Supporting Dynamic Graphs and Temporal Entity Deletions in the LDBC Social Network Benchmark's Data Generator , 2020, GRADES-NDA@SIGMOD.