75,000,000,000 Streaming Inserts/Second Using Hierarchical Hypersparse GraphBLAS Matrices

The SuiteSparse GraphBLAS C-library implements high performance hypersparse matrices with bindings to a variety of languages (Python, Julia, and Matlab/Octave). GraphBLAS provides a lightweight in-memory database implementation of hypersparse matrices that are ideal for analyzing many types of network data, while providing rigorous mathematical guarantees, such as linearity. Streaming updates of hypersparse matrices put enormous pressure on the memory hierarchy. This work benchmarks an implementation of hierarchical hypersparse matrices that reduces memory pressure and dramatically increases the update rate into a hypersparse matrices. The parameters of hierarchical hypersparse matrices rely on controlling the number of entries in each level in the hierarchy before an update is cascaded. The parameters are easily tunable to achieve optimal performance for a variety of applications. Hierarchical hypersparse matrices achieve over 1,000,000 updates per second in a single instance. Scaling to 31,000 instances of hierarchical hypersparse matrices arrays on 1,100 server nodes on the MIT SuperCloud achieved a sustained update rate of 75,000,000,000 updates per second. This capability allows the MIT SuperCloud to analyze extremely large streaming network data sets.

[1]  Emilio Leonardi,et al.  How to identify and estimate the largest traffic matrix elements in a dynamic environment , 2004, SIGMETRICS '04/Performance '04.

[2]  Carsten Lund,et al.  Estimating point-to-point and point-to-multipoint traffic matrices: an information-theoretic approach , 2005, IEEE/ACM Transactions on Networking.

[3]  Mark Crovella,et al.  Inferring invisible traffic , 2010, Co-NEXT '10.

[4]  Jeremy Kepner,et al.  Dynamic distributed dimensional data model (D4M) database and computation system , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[5]  Ranjan Sen,et al.  Benchmarking Apache Accumulo BigData Distributed Table Store Using Its Continuous Test Suite , 2013, 2013 IEEE International Congress on Big Data.

[6]  Matthew Roughan,et al.  Internet Traffic Matrices: A Primer , 2013 .

[7]  Jeremy Kepner,et al.  Achieving 100,000,000 database inserts per second using Accumulo and D4M , 2014, 2014 IEEE High Performance Extreme Computing Conference (HPEC).

[8]  Franz Franchetti,et al.  Mathematical foundations of the GraphBLAS , 2016, 2016 IEEE High Performance Extreme Computing Conference (HPEC).

[9]  Jeremy Kepner,et al.  Benchmarking SciDB data import on HPC systems , 2016, 2016 IEEE High Performance Extreme Computing Conference (HPEC).

[10]  M. Gentzkow,et al.  Social Media and Fake News in the 2016 Election , 2017 .

[11]  William Song,et al.  Streaming graph challenge: Stochastic block partition , 2017, 2017 IEEE High Performance Extreme Computing Conference (HPEC).

[12]  William Song,et al.  Static graph challenge: Subgraph isomorphism , 2017, 2017 IEEE High Performance Extreme Computing Conference (HPEC).

[13]  Marco Minutoli,et al.  High-Performance Data Analytics Beyond the Relational and Graph Data Models with GEMS , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

[14]  Roger Pearce,et al.  K-truss decomposition for Scale-Free Graphs at Scale in Distributed Memory , 2018, 2018 IEEE High Performance extreme Computing Conference (HPEC).

[15]  Jeremy Kepner,et al.  Hyperscaling Internet Graph Analysis with D4M on the MIT SuperCloud , 2018, 2018 IEEE High Performance extreme Computing Conference (HPEC).

[16]  David A. Bader,et al.  Hornet: An Efficient Data Structure for Dynamic Sparse Graphs and Matrices on GPUs , 2018, 2018 IEEE High Performance extreme Computing Conference (HPEC).

[17]  Timothy A. Davis,et al.  Graph algorithms via SuiteSparse: GraphBLAS: triangle counting and K-truss , 2018, 2018 IEEE High Performance extreme Computing Conference (HPEC).

[18]  H. Howie Huang,et al.  High-Performance Triangle Counting on GPUs , 2018, 2018 IEEE High Performance extreme Computing Conference (HPEC).

[19]  Mauro Bisson,et al.  Update on Static Graph Challenge on GPU , 2018, 2018 IEEE High Performance extreme Computing Conference (HPEC).

[20]  Sivasankaran Rajamanickam,et al.  Fast Triangle Counting Using Cilk , 2018, 2018 IEEE High Performance extreme Computing Conference (HPEC).

[21]  Jeremy Kepner,et al.  Interactive Supercomputing on 40,000 Cores for Machine Learning and Data Analysis , 2018, 2018 IEEE High Performance extreme Computing Conference (HPEC).

[22]  Jeremy Kepner,et al.  Hypersparse Neural Network Analysis of Large-Scale Internet Traffic , 2019, 2019 IEEE High Performance Extreme Computing Conference (HPEC).

[23]  N. Litvak,et al.  Mathematics for Big Data , 2019, The Best Writing on Mathematics 2019.

[24]  Jeremy Kepner,et al.  Streaming 1.9 Billion Hypersparse Network Updates per Second with D4M , 2019, 2019 IEEE High Performance Extreme Computing Conference (HPEC).