论文信息 - The Graph Based Benchmark Suite (GBBS)

The Graph Based Benchmark Suite (GBBS)

In this demonstration paper, we present the Graph Based Benchmark Suite (GBBS), a suite of scalable, provably-efficient implementations of over 20 fundamental graph problems for shared-memory multicore machines. Our results are obtained using a graph processing interface written in C++, extending the Ligra interface with additional functional primitives that have clearly defined cost bounds. Our approach enables writing high-level codes that are simultaneously simple and high-performance by virtue of using highly-optimized primitives. Another benefit is that optimizations, such as graph compression, are implemented transparently to high-level user code, and can thus be utilized without changing the implementation. Our approach enables our codes to scale to the largest publicly-available real-world graph containing over 200 billion edges on a single multicore machine. We show how to use GBBS to process and perform a variety of tasks on real-world graphs. We present the high-level C++ APIs that enable us to write concise, high-performance implementations. We also introduce a Python interface to GBBS, which lets users easily prototype algorithms and pipelines in Python that significantly outperform NetworkX, a mature Python-based graph processing solution.

[1] David A. Patterson,et al. The GAP Benchmark Suite , 2015, ArXiv.

[2] George H. L. Fletcher,et al. gMark: Schema-Driven Generation of Graphs and Queries , 2015, IEEE Transactions on Knowledge and Data Engineering.

[3] Ching-Yung Lin,et al. GraphBIG: understanding graph computing in the context of industrial solutions , 2015, SC15: International Conference for High Performance Computing, Networking, Storage and Analysis.

[4] Alexandru Iosup,et al. LDBC Graphalytics: A Benchmark for Large-Scale Graph Analysis on Parallel and Distributed Platforms , 2016, Proc. VLDB Endow..

[5] Lukasz Golab,et al. Stream WatDiv: A Streaming RDF Benchmark , 2018, SBD@SIGMOD.

[6] David A. Bader,et al. Design and Implementation of the HPCS Graph Analysis Benchmark on Symmetric Multiprocessors , 2005, HiPC.

[7] Guy E. Blelloch,et al. Julienne: A Framework for Parallel Graph Algorithms using Work-efficient Bucketing , 2017, SPAA.

[8] Norishige Chiba,et al. Arboricity and Subgraph Listing Algorithms , 1985, SIAM J. Comput..

[9] Ion Stoica,et al. ZipG: A Memory-efficient Graph Store for Interactive Queries , 2017, SIGMOD Conference.

[10] Aric Hagberg,et al. Exploring Network Structure, Dynamics, and Function using NetworkX , 2008, Proceedings of the Python in Science Conference.

[11] Julian Shun,et al. Parallel Clique Counting and Peeling Algorithms , 2020, ACDA.

[12] Timothy G. Armstrong,et al. LinkBench: a database benchmark based on the Facebook social graph , 2013, SIGMOD '13.

[13] Julian Shun,et al. Multicore triangle computations without tuning , 2015, 2015 IEEE 31st International Conference on Data Engineering.

[14] Ming Wu,et al. Managing Large Graphs on Multi-Cores with Graph Awareness , 2012, USENIX Annual Technical Conference.

[15] Cheng Chen,et al. GARDENIA: A Graph Processing Benchmark Suite for Next-Generation Accelerators , 2019, ACM J. Emerg. Technol. Comput. Syst..

[16] Sebastiano Vigna,et al. The Graph Structure in the Web - Analyzed on Different Aggregation Levels , 2015, J. Web Sci..

[17] Hassan Chafi,et al. The LDBC Social Network Benchmark: Interactive Workload , 2015, SIGMOD Conference.

[18] Sergey Brin,et al. The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[19] Jennifer Widom,et al. SimRank: a measure of structural-context similarity , 2002, KDD.

[20] Sebastiano Vigna,et al. The webgraph framework I: compression techniques , 2004, WWW '04.

[21] Hui Ding,et al. TAO: Facebook's Distributed Data Store for the Social Graph , 2013, USENIX Annual Technical Conference.

[22] Julian Shun,et al. Low-latency graph streaming using compressed purely-functional trees , 2019, PLDI.

[23] Omer Khan,et al. CRONO: A Benchmark Suite for Multithreaded Graph Algorithms Executing on Futuristic Multicores , 2015, 2015 IEEE International Symposium on Workload Characterization.

[24] Joseph M. Hellerstein,et al. GraphLab: A New Framework For Parallel Machine Learning , 2010, UAI.

[25] Emin Gün Sirer,et al. Weaver: A High-Performance, Transactional Graph Database Based on Refinable Timestamps , 2015, Proc. VLDB Endow..

[26] Yuanyuan Tian,et al. Big Graph Analytics Platforms , 2017, Found. Trends Databases.

[27] Guy E. Blelloch,et al. Smaller and Faster: Parallel Processing of Compressed Graphs with Ligra+ , 2015, 2015 Data Compression Conference.

[28] Guy E. Blelloch,et al. Brief announcement: the problem based benchmark suite , 2012, SPAA '12.

[29] Tim Weninger,et al. Thinking Like a Vertex , 2015, ACM Comput. Surv..

[30] Haixun Wang,et al. Trinity: a distributed graph engine on a memory cloud , 2013, SIGMOD '13.

[31] Guy E. Blelloch,et al. Sage: Parallel Semi-Asymmetric Graph Algorithms for NVRAMs , 2019, Proc. VLDB Endow..

[32] Julian Shun,et al. Practical parallel hypergraph algorithms , 2020, PPoPP.

[33] Jure Leskovec,et al. {SNAP Datasets}: {Stanford} Large Network Dataset Collection , 2014 .

[34] Guy E. Blelloch,et al. Phase-concurrent hash tables for determinism , 2014, SPAA.

[35] Keshav Pingali,et al. A lightweight infrastructure for graph analytics , 2013, SOSP.

[36] S. Sitharama Iyengar,et al. Introduction to parallel algorithms , 1998, Wiley series on parallel and distributed computing.

[37] Alexandru Iosup,et al. A Survey of Benchmarks for Graph-Processing Systems , 2018, Graph Data Management.

[38] Guy E. Blelloch,et al. Ligra: a lightweight graph processing framework for shared memory , 2013, PPoPP '13.

[39] Aart J. C. Bik,et al. Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[40] Hinrich Schütze,et al. CoSimRank: A Flexible & Efficient Graph-Theoretic Similarity Measure , 2014, ACL.

[41] Joseph Gonzalez,et al. PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs , 2012, OSDI.

[42] Guy E. Blelloch,et al. Theoretically Efficient Parallel Graph Algorithms Can Be Fast and Scalable , 2018, SPAA.