论文信息 - On the Efficacy of APUs for Heterogeneous Graph Computation

On the Efficacy of APUs for Heterogeneous Graph Computation

Accelerated Processing Units (APUs) are central processor s that feature integrated GPU cores. In this study, we show that thi s architecture is well-suited to the domain of graph analysis. O ur evaluation shows that a current-generation integrated GPU can o utperform an externally-connected discrete GPU by up to 50% for the breadth-first search and PageRank algorithms. Furthermore , by operating on data with different characteristics in unison, t he CPU and integrated GPU can halve the running time of PageRank on a scale-free dataset.

Karthik Nilakant | Karthik Nilakant

[1] P. Erdos,et al. On the evolution of random graphs , 1984 .

[2] Leslie G. Valiant,et al. A bridging model for parallel computation , 1990, CACM.

[3] Vipin Kumar,et al. A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs , 1998, SIAM J. Sci. Comput..

[4] Rajeev Motwani,et al. The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[5] Christos Faloutsos,et al. R-MAT: A Recursive Model for Graph Mining , 2004, SDM.

[6] David R. Kaeli,et al. Exploring the multiple-GPU design space , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[7] Cédric Augonnet,et al. StarPU: a unified platform for task scheduling on heterogeneous multicore architectures , 2011, Concurr. Comput. Pract. Exp..

[8] Aart J. C. Bik,et al. Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[9] David A. Bader,et al. Scalable Graph Exploration on Multicore Processors , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

[10] Martin D. F. Wong,et al. An effective GPU implementation of breadth-first search , 2010, Design Automation Conference.

[11] Brian W. Barrett,et al. Introducing the Graph 500 , 2010 .

[12] Kunle Olukotun,et al. Accelerating CUDA graph algorithms at maximum warp , 2011, PPoPP '11.

[13] Jeffrey S. Vetter,et al. Quantifying NUMA and contention effects in multi-GPU systems , 2011, GPGPU-4.

[14] Jungwon Kim,et al. Achieving a single compute device image in OpenCL for multiple GPUs , 2011, PPoPP '11.

[15] Kunle Olukotun,et al. Efficient Parallel Graph Exploration on Multi-Core CPU and GPU , 2011, 2011 International Conference on Parallel Architectures and Compilation Techniques.

[16] Pradeep Dubey,et al. Fast and Efficient Graph Traversal Algorithm for CPUs: Maximizing Single-Node Efficiency , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium.

[17] Andrew S. Grimshaw,et al. Scalable GPU graph traversal , 2012, PPoPP '12.

[18] Guy E. Blelloch,et al. GraphChi: Large-Scale Graph Computation on Just a PC , 2012, OSDI.

[19] Joseph Gonzalez,et al. PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs , 2012, OSDI.

[20] Guy E. Blelloch,et al. Ligra: a lightweight graph processing framework for shared memory , 2013, PPoPP '13.

[21] Keshav Pingali,et al. A lightweight infrastructure for graph analytics , 2013, SOSP.

[22] Michela Becchi,et al. Deploying Graph Algorithms on GPUs: An Adaptive Solution , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.

[23] Jean-Philippe Martin,et al. Dandelion: a compiler and runtime for heterogeneous systems , 2013, SOSP.

[24] David A. Wood,et al. Heterogeneous system coherence for integrated CPU-GPU systems , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[25] Kevin Skadron,et al. Pannotia: Understanding irregular GPGPU graph applications , 2013, 2013 IEEE International Symposium on Workload Characterization (IISWC).