论文信息 - The LDBC Social Network Benchmark: Business Intelligence Workload

The LDBC Social Network Benchmark: Business Intelligence Workload

The Social Network Benchmark's Business Intelligence workload (SNB BI) is a comprehensive graph OLAP benchmark targeting analytical data systems capable of supporting graph workloads. This paper marks the finalization of almost a decade of research in academia and industry via the Linked Data Benchmark Council (LDBC). SNB BI advances the state-of-the art in synthetic and scalable analytical database benchmarks in many aspects. Its base is a sophisticated data generator, implemented on a scalable distributed infrastructure, that produces a social graph with small-world phenomena, whose value properties follow skewed and correlated distributions and where values correlate with structure. This is a temporal graph where all nodes and edges follow lifespan-based rules with temporal skew enabling realistic and consistent temporal inserts and (recursive) deletes. The query workload exploiting this skew and correlation is based on LDBC's "choke point"-driven design methodology and will entice technical and scientific improvements in future (graph) database systems. SNB BI includes the first adoption of "parameter curation" in an analytical benchmark, a technique that ensures stable runtimes of query variants across different parameter values. Two performance metrics characterize peak single-query performance (power) and sustained concurrent query throughput. To demonstrate the portability of the benchmark, we present experimental results on a relational and a graph DBMS. Note that these do not constitute an official LDBC Benchmark Result - only audited results can use this trademarked term.

[1] I. Stoica,et al. TAOBench: An End-to-End Benchmark for Social Networking Workloads , 2022, Proc. VLDB Endow..

[2] Alin Deutsch,et al. Graph Pattern Matching in GQL and SQL/PGQ , 2021, SIGMOD Conference.

[3] Juan Sequeda,et al. Designing and Building Enterprise Knowledge Graphs , 2021, Synthesis Lectures on Data, Semantics, and Knowledge.

[4] Thomas Neumann. Evolution of a Compiling Query Engine , 2021, Proc. VLDB Endow..

[5] Yongchao Liu,et al. Taking the Pulse of Financial Activities with Online Graph Processing , 2021, ACM SIGOPS Oper. Syst. Rev..

[6] Viktor Leis,et al. Tidy Tuples and Flying Start: fast compilation and fast execution of relational queries in Umbra , 2021, The VLDB Journal.

[7] Amine Mhedhbi,et al. Columnar Storage and List-based Processing for Graph Database Management Systems , 2021, Proc. VLDB Endow..

[8] Alexandru Iosup,et al. The future is big graphs , 2020, Commun. ACM.

[9] W. L. Ngai,et al. The LDBC Graphalytics Benchmark , 2020, ArXiv.

[10] Dan Olteanu,et al. LMFAO: An engine for batches of group-by aggregates , 2020, Proc. VLDB Endow..

[11] Alfons Kemper,et al. Adopting worst-case optimal joins in relational database systems , 2020, Proc. VLDB Endow..

[12] Arnau Prat-Pérez,et al. Supporting Dynamic Graphs and Temporal Entity Deletions in the LDBC Social Network Benchmark's Data Generator , 2020, GRADES-NDA@SIGMOD.

[13] Alin Deutsch,et al. Aggregation Support for Modern Graph Analytics in TigerGraph , 2020, SIGMOD Conference.

[14] Tilmann Rabl,et al. Quantifying TPC-H choke points and their optimizations , 2020, Proc. VLDB Endow..

[15] Benjamin A. Steer,et al. The LDBC Social Network Benchmark , 2020, ArXiv.

[16] Arun C. S. Kumar,et al. Understanding and benchmarking the impact of GDPR on database systems , 2019, Proc. VLDB Endow..

[17] Hannes Mühleisen,et al. DuckDB: an Embeddable Analytical Database , 2019, SIGMOD Conference.

[18] Károly Takács,et al. Collapse of an online social network: Burning social capital to create it? , 2019, Soc. Networks.

[19] Amine Mhedhbi,et al. Optimizing Subgraph Queries by Combining Binary and Worst-Case Optimal Joins , 2019, Proc. VLDB Endow..

[20] Victor Lee,et al. TigerGraph: A Native MPP Graph Database , 2019, ArXiv.

[21] David A. Bader,et al. Hornet: An Efficient Data Structure for Dynamic Sparse Graphs and Matrices on GPUs , 2018, 2018 IEEE High Performance extreme Computing Conference (HPEC).

[22] Shahram Ghandeharizadeh,et al. BG: A scalable benchmark for interactive social networking actions , 2018, Future Gener. Comput. Syst..

[23] Peter A. Boncz,et al. An early look at the LDBC social network benchmark's business intelligence workload , 2018, GRADES/NDA@SIGMOD/PODS.

[24] Thomas Neumann,et al. Adaptive Optimization of Very Large Join Queries , 2018, SIGMOD Conference.

[25] Peter Boncz,et al. G-CORE: A Core for Future Graph Query Languages , 2017, SIGMOD Conference.

[26] Tilmann Rabl,et al. Analysis of TPC-DS: the first standard benchmark for SQL-based big data systems , 2017, SoCC.

[27] Amine Mhedhbi,et al. The ubiquity of large graphs and surprising challenges of graph processing: extended survey , 2017, The VLDB Journal.

[28] Hannes Mühleisen,et al. Don't Hold My Data Hostage - A Case For Client Protocol Redesign , 2017, Proc. VLDB Endow..

[29] Reynold Xin,et al. Apache Spark , 2016 .

[30] Dan Olteanu,et al. Factorized Databases , 2016, SGMD.

[31] Alexandru Iosup,et al. LDBC Graphalytics: A Benchmark for Large-Scale Graph Analysis on Parallel and Distributed Platforms , 2016, Proc. VLDB Endow..

[32] Sungpack Hong,et al. PGQL: a property graph query language , 2016, GRADES '16.

[33] Franz Franchetti,et al. Mathematical foundations of the GraphBLAS , 2016, 2016 IEEE High Performance Extreme Computing Conference (HPEC).

[34] Marko A. Rodriguez,et al. The Gremlin graph traversal machine and language (invited talk) , 2015, DBPL.

[35] Hassan Chafi,et al. The LDBC Social Network Benchmark: Interactive Workload , 2015, SIGMOD Conference.

[36] A. Kemper,et al. The More the Merrier: Efficient Multi-Source Graph Traversal , 2014, Proc. VLDB Endow..

[37] M. Tamer Özsu,et al. Diversified Stress Testing of RDF Data Management Systems , 2014, SEMWEB.

[38] Steven P. Reinhardt,et al. Extending SPARQL with graph functions , 2014, 2014 IEEE International Conference on Big Data (Big Data).

[39] Andrey Gubichev,et al. Parameter Curation for Benchmark Queries , 2014, TPCTC.

[40] Fan Xia,et al. BSMA: A Benchmark for Analytical Queries over Social Media Data , 2014, Proc. VLDB Endow..

[41] Jure Leskovec,et al. The bursty dynamics of the Twitter information network , 2014, WWW.

[42] Atri Rudra,et al. Skew strikes back: new developments in the theory of join algorithms , 2013, SGMD.

[43] Thomas Neumann,et al. TPC-H Analyzed: Hidden Messages and Lessons Learned from an Influential Benchmark , 2013, TPCTC.

[44] Reynold Xin,et al. GraphX: a resilient distributed graph system on Spark , 2013, GRADES.

[45] Timothy G. Armstrong,et al. LinkBench: a database benchmark based on the Facebook social graph , 2013, SIGMOD '13.

[46] Jimmy J. Lin,et al. WTF: the who to follow service at Twitter , 2013, WWW.

[47] Takuya Akiba,et al. Fast exact shortest-path distance queries on large networks by pruned landmark labeling , 2013, SIGMOD '13.

[48] Alessandro Acquisti,et al. Tweets are forever: a large-scale quantitative analysis of deleted tweets , 2013, CSCW.

[49] Todd L. Veldhuizen,et al. Leapfrog Triejoin: A Simple, Worst-Case Optimal Join Algorithm , 2012, 1210.0481.

[50] Peter A. Boncz,et al. S3G2: A Scalable Structure-Correlated Social Graph Generator , 2012, TPCTC.

[51] Jakub Závodný,et al. FDB: A Query Engine for Factorised Relational Databases , 2012, Proc. VLDB Endow..

[52] Adam Silberstein,et al. Benchmarking cloud serving systems with YCSB , 2010, SoCC '10.

[53] Karl Huppler,et al. The Art of Building a Good Benchmark , 2009, TPCTC.

[54] Jure Leskovec,et al. Meme-tracking and the dynamics of the news cycle , 2009, KDD.

[55] Christian Bizer,et al. The Berlin SPARQL Benchmark , 2009, Int. J. Semantic Web Inf. Syst..

[56] G. Lausen,et al. SP^2Bench: A SPARQL Performance Benchmark , 2008, 2009 IEEE 25th International Conference on Data Engineering.

[57] Jeff Heflin,et al. LUBM: A benchmark for OWL knowledge base systems , 2005, J. Web Semant..

[58] Jim Gray,et al. A "Measure of Transaction Processing" 20 Years Later , 2005, IEEE Data Eng. Bull..

[59] Uri Zwick,et al. On Dynamic Shortest Paths Problems , 2004, Algorithmica.

[60] Yousef Saad,et al. Iterative methods for sparse linear systems , 2003 .

[61] M. McPherson,et al. Birds of a Feather: Homophily in Social Networks , 2001 .

[62] Duncan J. Watts,et al. Collective dynamics of ‘small-world’ networks , 1998, Nature.

[63] Hamid Pirahesh,et al. Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals , 1996, Proceedings of the Twelfth International Conference on Data Engineering.

[64] Matthias Jarke,et al. Data lake concept and systems: a survey , 2021, ArXiv.

[65] Thomas Neumann,et al. Umbra: A Disk-Based System with In-Memory Performance , 2020, CIDR.

[66] Benjamin A. Steer,et al. Towards Testing ACID Compliance in the LDBC Social Network Benchmark , 2020, TPCTC.

[67] Transaction Processing Performance Council , 2019, Encyclopedia of Big Data Technologies.

[68] Renzo Angles,et al. The Property Graph Database Model , 2018, AMW.

[69] Jeffrey Xu Yu,et al. Graph Processing in RDBMSs , 2017, IEEE Data Eng. Bull..

[70] Stephan Günnemann,et al. Efficient Batched Distance and Centrality Computation in Unweighted and Weighted Graphs , 2017, BTW.

[71] M. Zaharia,et al. Apache Spark: a unified engine for big data processing , 2016, Commun. ACM.

[72] Alfons Kemper,et al. Unnesting Arbitrary Queries , 2015, BTW.

[73] Jim Gray,et al. The Benchmark Handbook for Database and Transaction Systems , 1993 .