GraphReduce: Large-Scale Graph Analytics on Accelerator-Based HPC Systems

Recent work on graph analytics has sought to leverage the high performance offered by GPU devices, but challenges remain due to the inherent irregularity of graph algorithm and limitations in GPU-resident memory for storing large graphs. The Graph Reduce methods presented in this paper permit a GPU-based accelerator to operate on graphs that exceed its internal memory capacity. Graph Reduce operates with a combination of both edge- and vertex-centric implementations of the Gather-Apply-Scatter programming model, to achieve high degrees of parallelism supported by methods that partition graphs across GPU and host memories and efficiently move graph data between both. Graph Reduce-based programming is performed via device functions that include gather map, gather reduce, apply, and scatter, implemented by programmers for the graph algorithms they wish to realize. Experimental evaluations for a wide variety of graph inputs, algorithms, and system configuration demonstrate that Graph Reduce outperforms other competing approaches.