The Graph Based Benchmark Suite (GBBS)

In this demonstration paper, we present the Graph Based Benchmark Suite (GBBS), a suite of scalable, provably-efficient implementations of over 20 fundamental graph problems for shared-memory multicore machines. Our results are obtained using a graph processing interface written in C++, extending the Ligra interface with additional functional primitives that have clearly defined cost bounds. Our approach enables writing high-level codes that are simultaneously simple and high-performance by virtue of using highly-optimized primitives. Another benefit is that optimizations, such as graph compression, are implemented transparently to high-level user code, and can thus be utilized without changing the implementation. Our approach enables our codes to scale to the largest publicly-available real-world graph containing over 200 billion edges on a single multicore machine. We show how to use GBBS to process and perform a variety of tasks on real-world graphs. We present the high-level C++ APIs that enable us to write concise, high-performance implementations. We also introduce a Python interface to GBBS, which lets users easily prototype algorithms and pipelines in Python that significantly outperform NetworkX, a mature Python-based graph processing solution.

[1]  David A. Patterson,et al.  The GAP Benchmark Suite , 2015, ArXiv.

[2]  George H. L. Fletcher,et al.  gMark: Schema-Driven Generation of Graphs and Queries , 2015, IEEE Transactions on Knowledge and Data Engineering.

[3]  Ching-Yung Lin,et al.  GraphBIG: understanding graph computing in the context of industrial solutions , 2015, SC15: International Conference for High Performance Computing, Networking, Storage and Analysis.

[4]  Alexandru Iosup,et al.  LDBC Graphalytics: A Benchmark for Large-Scale Graph Analysis on Parallel and Distributed Platforms , 2016, Proc. VLDB Endow..

[5]  Lukasz Golab,et al.  Stream WatDiv: A Streaming RDF Benchmark , 2018, SBD@SIGMOD.

[6]  David A. Bader,et al.  Design and Implementation of the HPCS Graph Analysis Benchmark on Symmetric Multiprocessors , 2005, HiPC.

[7]  Guy E. Blelloch,et al.  Julienne: A Framework for Parallel Graph Algorithms using Work-efficient Bucketing , 2017, SPAA.

[8]  Norishige Chiba,et al.  Arboricity and Subgraph Listing Algorithms , 1985, SIAM J. Comput..

[9]  Ion Stoica,et al.  ZipG: A Memory-efficient Graph Store for Interactive Queries , 2017, SIGMOD Conference.

[10]  Aric Hagberg,et al.  Exploring Network Structure, Dynamics, and Function using NetworkX , 2008, Proceedings of the Python in Science Conference.

[11]  Julian Shun,et al.  Parallel Clique Counting and Peeling Algorithms , 2020, ACDA.

[12]  Timothy G. Armstrong,et al.  LinkBench: a database benchmark based on the Facebook social graph , 2013, SIGMOD '13.

[13]  Julian Shun,et al.  Multicore triangle computations without tuning , 2015, 2015 IEEE 31st International Conference on Data Engineering.

[14]  Ming Wu,et al.  Managing Large Graphs on Multi-Cores with Graph Awareness , 2012, USENIX Annual Technical Conference.

[15]  Cheng Chen,et al.  GARDENIA: A Graph Processing Benchmark Suite for Next-Generation Accelerators , 2019, ACM J. Emerg. Technol. Comput. Syst..

[16]  Sebastiano Vigna,et al.  The Graph Structure in the Web - Analyzed on Different Aggregation Levels , 2015, J. Web Sci..

[17]  Hassan Chafi,et al.  The LDBC Social Network Benchmark: Interactive Workload , 2015, SIGMOD Conference.

[18]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[19]  Jennifer Widom,et al.  SimRank: a measure of structural-context similarity , 2002, KDD.

[20]  Sebastiano Vigna,et al.  The webgraph framework I: compression techniques , 2004, WWW '04.

[21]  Hui Ding,et al.  TAO: Facebook's Distributed Data Store for the Social Graph , 2013, USENIX Annual Technical Conference.

[22]  Julian Shun,et al.  Low-latency graph streaming using compressed purely-functional trees , 2019, PLDI.

[23]  Omer Khan,et al.  CRONO: A Benchmark Suite for Multithreaded Graph Algorithms Executing on Futuristic Multicores , 2015, 2015 IEEE International Symposium on Workload Characterization.

[24]  Joseph M. Hellerstein,et al.  GraphLab: A New Framework For Parallel Machine Learning , 2010, UAI.

[25]  Emin Gün Sirer,et al.  Weaver: A High-Performance, Transactional Graph Database Based on Refinable Timestamps , 2015, Proc. VLDB Endow..

[26]  Yuanyuan Tian,et al.  Big Graph Analytics Platforms , 2017, Found. Trends Databases.

[27]  Guy E. Blelloch,et al.  Smaller and Faster: Parallel Processing of Compressed Graphs with Ligra+ , 2015, 2015 Data Compression Conference.

[28]  Guy E. Blelloch,et al.  Brief announcement: the problem based benchmark suite , 2012, SPAA '12.

[29]  Tim Weninger,et al.  Thinking Like a Vertex , 2015, ACM Comput. Surv..

[30]  Haixun Wang,et al.  Trinity: a distributed graph engine on a memory cloud , 2013, SIGMOD '13.

[31]  Guy E. Blelloch,et al.  Sage: Parallel Semi-Asymmetric Graph Algorithms for NVRAMs , 2019, Proc. VLDB Endow..

[32]  Julian Shun,et al.  Practical parallel hypergraph algorithms , 2020, PPoPP.

[33]  Jure Leskovec,et al.  {SNAP Datasets}: {Stanford} Large Network Dataset Collection , 2014 .

[34]  Guy E. Blelloch,et al.  Phase-concurrent hash tables for determinism , 2014, SPAA.

[35]  Keshav Pingali,et al.  A lightweight infrastructure for graph analytics , 2013, SOSP.

[36]  S. Sitharama Iyengar,et al.  Introduction to parallel algorithms , 1998, Wiley series on parallel and distributed computing.

[37]  Alexandru Iosup,et al.  A Survey of Benchmarks for Graph-Processing Systems , 2018, Graph Data Management.

[38]  Guy E. Blelloch,et al.  Ligra: a lightweight graph processing framework for shared memory , 2013, PPoPP '13.

[39]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[40]  Hinrich Schütze,et al.  CoSimRank: A Flexible & Efficient Graph-Theoretic Similarity Measure , 2014, ACL.

[41]  Joseph Gonzalez,et al.  PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs , 2012, OSDI.

[42]  Guy E. Blelloch,et al.  Theoretically Efficient Parallel Graph Algorithms Can Be Fast and Scalable , 2018, SPAA.