论文信息 - GraphReduce: Large-Scale Graph Analytics on Accelerator-Based HPC Systems

GraphReduce: Large-Scale Graph Analytics on Accelerator-Based HPC Systems

Recent work on graph analytics has sought to leverage the high performance offered by GPU devices, but challenges remain due to the inherent irregularity of graph algorithm and limitations in GPU-resident memory for storing large graphs. The Graph Reduce methods presented in this paper permit a GPU-based accelerator to operate on graphs that exceed its internal memory capacity. Graph Reduce operates with a combination of both edge- and vertex-centric implementations of the Gather-Apply-Scatter programming model, to achieve high degrees of parallelism supported by methods that partition graphs across GPU and host memories and efficiently move graph data between both. Graph Reduce-based programming is performed via device functions that include gather map, gather reduce, apply, and scatter, implemented by programmers for the graph algorithms they wish to realize. Experimental evaluations for a wide variety of graph inputs, algorithms, and system configuration demonstrate that Graph Reduce outperforms other competing approaches.

[1] Guy E. Blelloch,et al. GraphChi: Large-Scale Graph Computation on Just a PC , 2012, OSDI.

[2] Jonathan W. Berry,et al. Challenges in Parallel Graph Processing , 2007, Parallel Process. Lett..

[3] Karsten Schwan,et al. Multi-tenancy on GPGPU-based servers , 2013, VTDC '13.

[4] Karsten Schwan,et al. Scheduling Multi-tenant Cloud Workloads on Accelerator-Based Systems , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.

[5] Aart J. C. Bik,et al. Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[6] Joseph Gonzalez,et al. PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs , 2012, OSDI.

[7] Zhisong Fu,et al. MapGraph: A High Level API for Fast Development of High Performance Graph Analytics on GPUs , 2014, GRADES.

[8] Joseph E. Gonzalez,et al. GraphLab: A New Parallel Framework for Machine Learning , 2010 .

[9] Kunle Olukotun,et al. Efficient Parallel Graph Exploration on Multi-Core CPU and GPU , 2011, 2011 International Conference on Parallel Architectures and Compilation Techniques.

[10] Willy Zwaenepoel,et al. X-Stream: edge-centric graph processing using streaming partitions , 2013, SOSP.

[11] Jure Leskovec,et al. Community Structure in Large Networks: Natural Cluster Sizes and the Absence of Large Well-Defined Clusters , 2008, Internet Math..