The energy case for graph processing on hybrid CPU and GPU systems

This paper investigates the power, energy, and performance characteristics of large-scale graph processing on hybrid (i.e., CPU and GPU) single-node systems. Graph processing can be accelerated on hybrid systems by properly mapping the graph-layout to processing units, such that the algorithmic tasks exercise each of the units where they perform best. However, the GPUs have much higher Thermal Design Power (TDP), thus their impact on the overall energy consumption is unclear. Our evaluation using large real-world graphs and synthetic graphs as large as 1 billion vertices and 16 billion edges shows that a hybrid system is efficient in terms of both time-to-solution and energy.

[1]  Leslie G. Valiant,et al.  A bridging model for parallel computation , 1990, CACM.

[2]  Albert-László Barabási,et al.  Internet: Diameter of the World-Wide Web , 1999, Nature.

[3]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[4]  Christos Faloutsos,et al.  R-MAT: A Recursive Model for Graph Mining , 2004, SDM.

[5]  Ravi Kumar,et al.  Structure and evolution of online social networks , 2006, KDD '06.

[6]  P. J. Narayanan,et al.  Accelerating Large Graph Algorithms on the GPU Using CUDA , 2007, HiPC.

[7]  Sebastiano Vigna,et al.  A large time-aware web graph , 2008, SIGF.

[8]  Matei Ripeanu,et al.  Size Matters: Space/Time Tradeoffs to Improve GPGPU Applications Performance , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

[9]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[10]  David A. Bader,et al.  Scalable Graph Exploration on Multicore Processors , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

[11]  Matthias S. Müller,et al.  Characterizing the energy consumption of data transfers and arithmetic operations on x86−64 processors , 2010, International Conference on Green Computing.

[12]  Hosung Park,et al.  What is Twitter, a social network or a news media? , 2010, WWW '10.

[13]  Wu-chun Feng,et al.  Power and Performance Characterization of Computational Kernels on the GPU , 2010, 2010 IEEE/ACM Int'l Conference on Green Computing and Communications & Int'l Conference on Cyber, Physical and Social Computing.

[14]  Kunle Olukotun,et al.  Accelerating CUDA graph algorithms at maximum warp , 2011, PPoPP '11.

[15]  Kunle Olukotun,et al.  Efficient Parallel Graph Exploration on Multi-Core CPU and GPU , 2011, 2011 International Conference on Parallel Architectures and Compilation Techniques.

[16]  Marco Rosa,et al.  Layered label propagation: a multiresolution coordinate-free ordering for compressing social networks , 2010, WWW.

[17]  Pradeep Dubey,et al.  Fast and Efficient Graph Traversal Algorithm for CPUs: Maximizing Single-Node Efficiency , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium.

[18]  Andrew S. Grimshaw,et al.  Scalable GPU graph traversal , 2012, PPoPP '12.

[19]  Antony Rowstron,et al.  Nobody ever got fired for using Hadoop on a cluster , 2012, HotCDP '12.

[20]  Guy E. Blelloch,et al.  GraphChi: Large-Scale Graph Computation on Just a PC , 2012, OSDI.

[21]  Matei Ripeanu,et al.  A yoke of oxen and a thousand chickens for heavy lifting graph processing , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).

[22]  Hiroshi Sasaki,et al.  Power and Performance Analysis of GPU-Accelerated Systems , 2012, HotPower.

[23]  Robert Schöne,et al.  Memory Performance at Reduced CPU Clock Speeds: An Analysis of Current x86_64 Processors , 2012, HotPower.

[24]  Guy E. Blelloch,et al.  Ligra: a lightweight graph processing framework for shared memory , 2013, PPoPP '13.

[25]  Matei Ripeanu,et al.  On Graphs, GPUs, and Blind Dating: A Workload to Processor Matchmaking Quest , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.

[26]  Jimmy J. Lin,et al.  WTF: the who to follow service at Twitter , 2013, WWW.

[27]  Jinha Kim,et al.  TurboGraph: a fast parallel graph engine handling billion-scale graphs in a single PC , 2013, KDD.