Vertical, Temporal, and Horizontal Scaling of Hierarchical Hypersparse GraphBLAS Matrices

Hypersparse matrices are a powerful enabler for a variety of network, health, finance, and social applications. Hierarchical hypersparse GraphBLAS matrices enable rapid streaming updates while preserving algebraic analytic power and convenience. In many contexts, the rate of these updates sets the bounds on performance. This paper explores hierarchical hypersparse update performance on a variety of hardware with identical software configurations. The high-level language bindings of the GraphBLAS readily enable performance experiments on simultaneous diverse hardware. The best single process performance measured was 4,000,000 updates per second. The best single node performance measured was 170,000,000 updates per second. The hardware used spans nearly a decade and allows a direct comparison of hardware improvements for this computation over this time range; showing a 2x increase in single-core performance, a 3x increase in single process performance, and a 5x increase in single node performance. Running on nearly 2,000 MIT SuperCloud nodes simultaneously achieved a sustained update rate of over 200,000,000,000 updates per second. Hierarchical hypersparse GraphBLAS allows the MIT SuperCloud to analyze extremely large streaming network data sets.

[1]  Franz Franchetti,et al.  Mathematical foundations of the GraphBLAS , 2016, 2016 IEEE High Performance Extreme Computing Conference (HPEC).

[2]  Jeremy Kepner,et al.  Dynamic distributed dimensional data model (D4M) database and computation system , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[3]  Marco Minutoli,et al.  High-Performance Data Analytics Beyond the Relational and Graph Data Models with GEMS , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

[4]  Timothy A. Davis,et al.  Graph algorithms via SuiteSparse: GraphBLAS: triangle counting and K-truss , 2018, 2018 IEEE High Performance extreme Computing Conference (HPEC).

[5]  William Song,et al.  Static graph challenge: Subgraph isomorphism , 2017, 2017 IEEE High Performance Extreme Computing Conference (HPEC).

[6]  Jeremy Kepner,et al.  Multi-Temporal Analysis and Scaling Relations of 100,000,000,000 Network Packets , 2020, 2020 IEEE High Performance Extreme Computing Conference (HPEC).

[7]  Jeremy Kepner,et al.  Hyperscaling Internet Graph Analysis with D4M on the MIT SuperCloud , 2018, 2018 IEEE High Performance extreme Computing Conference (HPEC).

[8]  kc claffy,et al.  Workshop on Internet Economics (WIE 2019) report , 2020, Comput. Commun. Rev..

[9]  Michael Jones,et al.  75,000,000,000 Streaming Inserts/Second Using Hierarchical Hypersparse GraphBLAS Matrices , 2020, 2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

[10]  Roger Pearce,et al.  K-truss decomposition for Scale-Free Graphs at Scale in Distributed Memory , 2018, 2018 IEEE High Performance extreme Computing Conference (HPEC).

[11]  Jeremy Kepner,et al.  Interactive Supercomputing on 40,000 Cores for Machine Learning and Data Analysis , 2018, 2018 IEEE High Performance extreme Computing Conference (HPEC).

[12]  David A. Bader,et al.  Hornet: An Efficient Data Structure for Dynamic Sparse Graphs and Matrices on GPUs , 2018, 2018 IEEE High Performance extreme Computing Conference (HPEC).

[13]  Tinkara Toš,et al.  Graph Algorithms in the Language of Linear Algebra , 2012, Software, environments, tools.

[14]  John R. Gilbert,et al.  On the representation and multiplication of hypersparse matrices , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[15]  Jeremy Kepner Parallel MATLAB - for Multicore and Multinode Computers , 2009, Software, environments, tools.

[16]  Jeremy Kepner,et al.  Benchmarking SciDB data import on HPC systems , 2016, 2016 IEEE High Performance Extreme Computing Conference (HPEC).

[17]  Mauro Bisson,et al.  Update on Static Graph Challenge on GPU , 2018, 2018 IEEE High Performance extreme Computing Conference (HPEC).

[18]  H. Howie Huang,et al.  High-Performance Triangle Counting on GPUs , 2018, 2018 IEEE High Performance extreme Computing Conference (HPEC).

[19]  Ranjan Sen,et al.  Benchmarking Apache Accumulo BigData Distributed Table Store Using Its Continuous Test Suite , 2013, 2013 IEEE International Congress on Big Data.

[20]  William Song,et al.  Streaming graph challenge: Stochastic block partition , 2017, 2017 IEEE High Performance Extreme Computing Conference (HPEC).

[21]  Jeremy Kepner,et al.  Streaming 1.9 Billion Hypersparse Network Updates per Second with D4M , 2019, 2019 IEEE High Performance Extreme Computing Conference (HPEC).

[22]  Jeremy Kepner,et al.  Zero Botnets: An Observe-Pursue-Counter Approach , 2022, ArXiv.

[23]  Jeremy Kepner,et al.  Large Scale Parallelization Using File-Based Communications , 2019, 2019 IEEE High Performance Extreme Computing Conference (HPEC).

[24]  M. Gentzkow,et al.  Social Media and Fake News in the 2016 Election , 2017 .

[25]  Jeremy Kepner,et al.  Optimizing Xeon Phi for Interactive Data Analysis , 2019, 2019 IEEE High Performance Extreme Computing Conference (HPEC).

[26]  John R. Gilbert,et al.  Parallel Sparse Matrix-Matrix Multiplication and Indexing: Implementation and Experiments , 2011, SIAM J. Sci. Comput..

[27]  Jeremy Kepner,et al.  Achieving 100,000,000 database inserts per second using Accumulo and D4M , 2014, 2014 IEEE High Performance Extreme Computing Conference (HPEC).

[28]  Sivasankaran Rajamanickam,et al.  Fast Triangle Counting Using Cilk , 2018, 2018 IEEE High Performance extreme Computing Conference (HPEC).