The LDBC Social Network Benchmark: Business Intelligence Workload

The Social Network Benchmark's Business Intelligence workload (SNB BI) is a comprehensive graph OLAP benchmark targeting analytical data systems capable of supporting graph workloads. This paper marks the finalization of almost a decade of research in academia and industry via the Linked Data Benchmark Council (LDBC). SNB BI advances the state-of-the art in synthetic and scalable analytical database benchmarks in many aspects. Its base is a sophisticated data generator, implemented on a scalable distributed infrastructure, that produces a social graph with small-world phenomena, whose value properties follow skewed and correlated distributions and where values correlate with structure. This is a temporal graph where all nodes and edges follow lifespan-based rules with temporal skew enabling realistic and consistent temporal inserts and (recursive) deletes. The query workload exploiting this skew and correlation is based on LDBC's "choke point"-driven design methodology and will entice technical and scientific improvements in future (graph) database systems. SNB BI includes the first adoption of "parameter curation" in an analytical benchmark, a technique that ensures stable runtimes of query variants across different parameter values. Two performance metrics characterize peak single-query performance (power) and sustained concurrent query throughput. To demonstrate the portability of the benchmark, we present experimental results on a relational and a graph DBMS. Note that these do not constitute an official LDBC Benchmark Result - only audited results can use this trademarked term.

[1]  I. Stoica,et al.  TAOBench: An End-to-End Benchmark for Social Networking Workloads , 2022, Proc. VLDB Endow..

[2]  Alin Deutsch,et al.  Graph Pattern Matching in GQL and SQL/PGQ , 2021, SIGMOD Conference.

[3]  Juan Sequeda,et al.  Designing and Building Enterprise Knowledge Graphs , 2021, Synthesis Lectures on Data, Semantics, and Knowledge.

[4]  Thomas Neumann Evolution of a Compiling Query Engine , 2021, Proc. VLDB Endow..

[5]  Yongchao Liu,et al.  Taking the Pulse of Financial Activities with Online Graph Processing , 2021, ACM SIGOPS Oper. Syst. Rev..

[6]  Viktor Leis,et al.  Tidy Tuples and Flying Start: fast compilation and fast execution of relational queries in Umbra , 2021, The VLDB Journal.

[7]  Amine Mhedhbi,et al.  Columnar Storage and List-based Processing for Graph Database Management Systems , 2021, Proc. VLDB Endow..

[8]  Alexandru Iosup,et al.  The future is big graphs , 2020, Commun. ACM.

[9]  W. L. Ngai,et al.  The LDBC Graphalytics Benchmark , 2020, ArXiv.

[10]  Dan Olteanu,et al.  LMFAO: An engine for batches of group-by aggregates , 2020, Proc. VLDB Endow..

[11]  Alfons Kemper,et al.  Adopting worst-case optimal joins in relational database systems , 2020, Proc. VLDB Endow..

[12]  Arnau Prat-Pérez,et al.  Supporting Dynamic Graphs and Temporal Entity Deletions in the LDBC Social Network Benchmark's Data Generator , 2020, GRADES-NDA@SIGMOD.

[13]  Alin Deutsch,et al.  Aggregation Support for Modern Graph Analytics in TigerGraph , 2020, SIGMOD Conference.

[14]  Tilmann Rabl,et al.  Quantifying TPC-H choke points and their optimizations , 2020, Proc. VLDB Endow..

[15]  Benjamin A. Steer,et al.  The LDBC Social Network Benchmark , 2020, ArXiv.

[16]  Arun C. S. Kumar,et al.  Understanding and benchmarking the impact of GDPR on database systems , 2019, Proc. VLDB Endow..

[17]  Hannes Mühleisen,et al.  DuckDB: an Embeddable Analytical Database , 2019, SIGMOD Conference.

[18]  Károly Takács,et al.  Collapse of an online social network: Burning social capital to create it? , 2019, Soc. Networks.

[19]  Amine Mhedhbi,et al.  Optimizing Subgraph Queries by Combining Binary and Worst-Case Optimal Joins , 2019, Proc. VLDB Endow..

[20]  Victor Lee,et al.  TigerGraph: A Native MPP Graph Database , 2019, ArXiv.

[21]  David A. Bader,et al.  Hornet: An Efficient Data Structure for Dynamic Sparse Graphs and Matrices on GPUs , 2018, 2018 IEEE High Performance extreme Computing Conference (HPEC).

[22]  Shahram Ghandeharizadeh,et al.  BG: A scalable benchmark for interactive social networking actions , 2018, Future Gener. Comput. Syst..

[23]  Peter A. Boncz,et al.  An early look at the LDBC social network benchmark's business intelligence workload , 2018, GRADES/NDA@SIGMOD/PODS.

[24]  Thomas Neumann,et al.  Adaptive Optimization of Very Large Join Queries , 2018, SIGMOD Conference.

[25]  Peter Boncz,et al.  G-CORE: A Core for Future Graph Query Languages , 2017, SIGMOD Conference.

[26]  Tilmann Rabl,et al.  Analysis of TPC-DS: the first standard benchmark for SQL-based big data systems , 2017, SoCC.

[27]  Amine Mhedhbi,et al.  The ubiquity of large graphs and surprising challenges of graph processing: extended survey , 2017, The VLDB Journal.

[28]  Hannes Mühleisen,et al.  Don't Hold My Data Hostage - A Case For Client Protocol Redesign , 2017, Proc. VLDB Endow..

[29]  Reynold Xin,et al.  Apache Spark , 2016 .

[30]  Dan Olteanu,et al.  Factorized Databases , 2016, SGMD.

[31]  Alexandru Iosup,et al.  LDBC Graphalytics: A Benchmark for Large-Scale Graph Analysis on Parallel and Distributed Platforms , 2016, Proc. VLDB Endow..

[32]  Sungpack Hong,et al.  PGQL: a property graph query language , 2016, GRADES '16.

[33]  Franz Franchetti,et al.  Mathematical foundations of the GraphBLAS , 2016, 2016 IEEE High Performance Extreme Computing Conference (HPEC).

[34]  Marko A. Rodriguez,et al.  The Gremlin graph traversal machine and language (invited talk) , 2015, DBPL.

[35]  Hassan Chafi,et al.  The LDBC Social Network Benchmark: Interactive Workload , 2015, SIGMOD Conference.

[36]  A. Kemper,et al.  The More the Merrier: Efficient Multi-Source Graph Traversal , 2014, Proc. VLDB Endow..

[37]  M. Tamer Özsu,et al.  Diversified Stress Testing of RDF Data Management Systems , 2014, SEMWEB.

[38]  Steven P. Reinhardt,et al.  Extending SPARQL with graph functions , 2014, 2014 IEEE International Conference on Big Data (Big Data).

[39]  Andrey Gubichev,et al.  Parameter Curation for Benchmark Queries , 2014, TPCTC.

[40]  Fan Xia,et al.  BSMA: A Benchmark for Analytical Queries over Social Media Data , 2014, Proc. VLDB Endow..

[41]  Jure Leskovec,et al.  The bursty dynamics of the Twitter information network , 2014, WWW.

[42]  Atri Rudra,et al.  Skew strikes back: new developments in the theory of join algorithms , 2013, SGMD.

[43]  Thomas Neumann,et al.  TPC-H Analyzed: Hidden Messages and Lessons Learned from an Influential Benchmark , 2013, TPCTC.

[44]  Reynold Xin,et al.  GraphX: a resilient distributed graph system on Spark , 2013, GRADES.

[45]  Timothy G. Armstrong,et al.  LinkBench: a database benchmark based on the Facebook social graph , 2013, SIGMOD '13.

[46]  Jimmy J. Lin,et al.  WTF: the who to follow service at Twitter , 2013, WWW.

[47]  Takuya Akiba,et al.  Fast exact shortest-path distance queries on large networks by pruned landmark labeling , 2013, SIGMOD '13.

[48]  Alessandro Acquisti,et al.  Tweets are forever: a large-scale quantitative analysis of deleted tweets , 2013, CSCW.

[49]  Todd L. Veldhuizen,et al.  Leapfrog Triejoin: A Simple, Worst-Case Optimal Join Algorithm , 2012, 1210.0481.

[50]  Peter A. Boncz,et al.  S3G2: A Scalable Structure-Correlated Social Graph Generator , 2012, TPCTC.

[51]  Jakub Závodný,et al.  FDB: A Query Engine for Factorised Relational Databases , 2012, Proc. VLDB Endow..

[52]  Adam Silberstein,et al.  Benchmarking cloud serving systems with YCSB , 2010, SoCC '10.

[53]  Karl Huppler,et al.  The Art of Building a Good Benchmark , 2009, TPCTC.

[54]  Jure Leskovec,et al.  Meme-tracking and the dynamics of the news cycle , 2009, KDD.

[55]  Christian Bizer,et al.  The Berlin SPARQL Benchmark , 2009, Int. J. Semantic Web Inf. Syst..

[56]  G. Lausen,et al.  SP^2Bench: A SPARQL Performance Benchmark , 2008, 2009 IEEE 25th International Conference on Data Engineering.

[57]  Jeff Heflin,et al.  LUBM: A benchmark for OWL knowledge base systems , 2005, J. Web Semant..

[58]  Jim Gray,et al.  A "Measure of Transaction Processing" 20 Years Later , 2005, IEEE Data Eng. Bull..

[59]  Uri Zwick,et al.  On Dynamic Shortest Paths Problems , 2004, Algorithmica.

[60]  Yousef Saad,et al.  Iterative methods for sparse linear systems , 2003 .

[61]  M. McPherson,et al.  Birds of a Feather: Homophily in Social Networks , 2001 .

[62]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[63]  Hamid Pirahesh,et al.  Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals , 1996, Proceedings of the Twelfth International Conference on Data Engineering.

[64]  Matthias Jarke,et al.  Data lake concept and systems: a survey , 2021, ArXiv.

[65]  Thomas Neumann,et al.  Umbra: A Disk-Based System with In-Memory Performance , 2020, CIDR.

[66]  Benjamin A. Steer,et al.  Towards Testing ACID Compliance in the LDBC Social Network Benchmark , 2020, TPCTC.

[67]  Transaction Processing Performance Council , 2019, Encyclopedia of Big Data Technologies.

[68]  Renzo Angles,et al.  The Property Graph Database Model , 2018, AMW.

[69]  Jeffrey Xu Yu,et al.  Graph Processing in RDBMSs , 2017, IEEE Data Eng. Bull..

[70]  Stephan Günnemann,et al.  Efficient Batched Distance and Centrality Computation in Unweighted and Weighted Graphs , 2017, BTW.

[71]  M. Zaharia,et al.  Apache Spark: a unified engine for big data processing , 2016, Commun. ACM.

[72]  Alfons Kemper,et al.  Unnesting Arbitrary Queries , 2015, BTW.

[73]  Jim Gray,et al.  The Benchmark Handbook for Database and Transaction Systems , 1993 .