How Well do CPU, GPU and Hybrid Graph Processing Frameworks Perform?

The importance of high-performance graph processing to solve big data problems targeting high-impact applications is greater than ever before. Recent graph processing frameworks target different hardware platforms (e.g., shared memory systems, accelerators such as GPUs, and distributed systems) and differ with respect to the programming model they adopt (e.g., based on linear algebra formulations of graph algorithms or enabling direct access to the graph structure). To better understand the impact of these choices, this paper, presents a comparative study of five state-of-the-art graph processing frameworks: two CPU-only frameworks - GraphMat and Galois, two GPU-based frameworks - Nvgraph and Gunrock; and Totem, a hybrid (CPU+GPU) framework. We use three popular graph algorithms (PageRank, Single Source Shortest Path, and Breadth-First Search), and massive scale graphs with up to billions of edges. Our evaluation focuses on three performance metrics: (i) execution time, (ii) scalability and (iii) energy consumption.

[1]  Keval Vora,et al.  CuSha: vertex-centric graph processing on GPUs , 2014, HPDC '14.

[2]  Ali Tizghadam,et al.  A graph theoretical approach to traffic engineering and network control problem , 2009, 2009 21st International Teletraffic Congress.

[3]  Alexandru Iosup,et al.  How Well Do Graph-Processing Platforms Perform? An Empirical Performance Evaluation and Analysis , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.

[4]  Christos Faloutsos,et al.  R-MAT: A Recursive Model for Graph Mining , 2004, SDM.

[5]  Alexandru Iosup,et al.  Design and Experimental Evaluation of Distributed Heterogeneous Graph-Processing Systems , 2016, CCGrid.

[6]  Kunle Olukotun,et al.  Accelerating CUDA graph algorithms at maximum warp , 2011, PPoPP '11.

[7]  Andrew A. Chien,et al.  Understanding Graph Computation Behavior to Enable Robust Benchmarking , 2015, HPDC.

[8]  Benno Schwikowski,et al.  Discovering regulatory and signalling circuits in molecular interaction networks , 2002, ISMB.

[9]  Ching-Yung Lin,et al.  GraphBIG: understanding graph computing in the context of industrial solutions , 2015, SC15: International Conference for High Performance Computing, Networking, Storage and Analysis.

[10]  Tinkara Toš,et al.  Graph Algorithms in the Language of Linear Algebra , 2012, Software, environments, tools.

[11]  Edsger W. Dijkstra,et al.  A note on two problems in connexion with graphs , 1959, Numerische Mathematik.

[12]  H. M. Bucker,et al.  Reformulating a Breadth-First Search Algorithm on an Undirected Graph in the Language of Linear Algebra , 2014, 2014 International Conference on Mathematics and Computers in Sciences and in Industry.

[13]  Timothy A. Davis,et al.  The university of Florida sparse matrix collection , 2011, TOMS.

[14]  Andrew S. Grimshaw,et al.  Scalable GPU graph traversal , 2012, PPoPP '12.

[15]  John D. Owens,et al.  Gunrock: a high-performance graph processing library on the GPU , 2015, PPoPP.

[16]  Guy E. Blelloch,et al.  Ligra: a lightweight graph processing framework for shared memory , 2013, PPoPP '13.

[17]  José E. Moreira,et al.  Graph Programming Interface (GPI): A Linear Algebra Programming Model for Large Scale Graph Computations , 2016, International Journal of Parallel Programming.

[18]  Jure Leskovec,et al.  Defining and evaluating network communities based on ground-truth , 2012, Knowledge and Information Systems.

[19]  Matei Ripeanu,et al.  A yoke of oxen and a thousand chickens for heavy lifting graph processing , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).

[20]  Ulrich Meyer,et al.  [Delta]-stepping: a parallelizable shortest path algorithm , 2003, J. Algorithms.

[21]  Pradeep Dubey,et al.  GraphMat: High performance graph analytics made productive , 2015, Proc. VLDB Endow..

[22]  Brian W. Barrett,et al.  Introducing the Graph 500 , 2010 .

[23]  Dan Meng,et al.  An evaluation and analysis of graph processing frameworks on five key issues , 2015, Conf. Computing Frontiers.

[24]  P. J. Narayanan,et al.  Accelerating Large Graph Algorithms on the GPU Using CUDA , 2007, HiPC.

[25]  Alexandru Iosup,et al.  LDBC Graphalytics: A Benchmark for Large-Scale Graph Analysis on Parallel and Distributed Platforms , 2016, Proc. VLDB Endow..

[26]  John R. Gilbert,et al.  On the representation and multiplication of hypersparse matrices , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[27]  Joseph Gonzalez,et al.  PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs , 2012, OSDI.

[28]  Matei Ripeanu,et al.  On Graphs, GPUs, and Blind Dating: A Workload to Processor Matchmaking Quest , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.

[29]  M. Tamer Özsu,et al.  An Experimental Comparison of Pregel-like Graph Processing Systems , 2014, Proc. VLDB Endow..

[30]  Jon M. Kleinberg,et al.  Group formation in large social networks: membership, growth, and evolution , 2006, KDD '06.

[31]  Carlos Guestrin,et al.  Distributed GraphLab : A Framework for Machine Learning and Data Mining in the Cloud , 2012 .

[32]  Keshav Pingali,et al.  A lightweight infrastructure for graph analytics , 2013, SOSP.

[33]  Krishna P. Gummadi,et al.  Measuring User Influence in Twitter: The Million Follower Fallacy , 2010, ICWSM.

[34]  Matei Ripeanu,et al.  Efficient Large-Scale Graph Processing on Hybrid CPU and GPU Systems , 2013, ArXiv.

[35]  David A. Patterson,et al.  Direction-optimizing Breadth-First Search , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[36]  Alexandru Iosup,et al.  Graphalytics: A Big Data Benchmark for Graph-Processing Platforms , 2015, GRADES@SIGMOD/PODS.