GraphGrind: addressing load imbalance of graph partitioning

We investigate how graph partitioning adversely affects the performance of graph analytics. We demonstrate that graph partitioning induces extra work during graph traversal and that graph partitions have markedly different connectivity than the original graph. By consequence, increasing the number of partitions reaches a tipping point after which overheads quickly dominate performance gains. Moreover, we show that the heuristic to balance CPU load between graph partitions by balancing the number of edges is inappropriate for a range of graph analyses. However, even when it is appropriate, it is sub-optimal due to the skewed degree distribution of social networks. Based on these observations, we propose GraphGrind, a new graph analytics system that addresses the limitations incurred by graph partitioning. We moreover propose a NUMA-aware extension to the Cilk programming language and obtain a scale-free yet NUMA-aware parallel programming environment which underpins NUMA-aware scheduling in GraphGrind. We demonstrate that Graph-Grind outperforms state-of-the-art graph analytics systems for shared memory including Ligra, Polymer and Galois.

[1]  Charles E. Leiserson,et al.  Deterministic parallel random-number generation for dynamic-multithreading platforms , 2012, PPoPP '12.

[2]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[3]  Margo I. Seltzer,et al.  A Scalable Distributed Graph Partitioner , 2015, Proc. VLDB Endow..

[4]  David A. Patterson,et al.  Direction-optimizing Breadth-First Search , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[5]  Matteo Frigo,et al.  The implementation of the Cilk-5 multithreaded language , 1998, PLDI.

[6]  Marc Lelarge,et al.  Balanced graph edge partition , 2014, KDD.

[7]  Krishna P. Gummadi,et al.  Measurement and analysis of online social networks , 2007, IMC '07.

[8]  Keshav Pingali,et al.  A lightweight infrastructure for graph analytics , 2013, SOSP.

[9]  Hosung Park,et al.  What is Twitter, a social network or a news media? , 2010, WWW '10.

[10]  Ymir Vigfusson,et al.  Affinity in Distributed Systems , 2010 .

[11]  Padma Raghavan,et al.  NUMA-aware graph mining techniques for performance and energy efficiency , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[12]  Youcef Saad,et al.  A Basic Tool Kit for Sparse Matrix Computations , 1990 .

[13]  Guy E. Blelloch,et al.  Ligra: a lightweight graph processing framework for shared memory , 2013, PPoPP '13.

[14]  David A. Bader,et al.  Scalable Graph Exploration on Multicore Processors , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

[15]  Keshav Pingali,et al.  An experimental comparison of cache-oblivious and cache-conscious programs , 2007, SPAA '07.

[16]  David A. Patterson,et al.  Direction-optimizing breadth-first search , 2012, HiPC 2012.

[17]  Matteo Frigo,et al.  Reducers and other Cilk++ hyperobjects , 2009, SPAA '09.

[18]  George Karypis,et al.  Multilevel k-way Partitioning Scheme for Irregular Graphs , 1998, J. Parallel Distributed Comput..

[19]  Joseph Gonzalez,et al.  PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs , 2012, OSDI.

[20]  Jure Leskovec,et al.  Defining and evaluating network communities based on ground-truth , 2012, Knowledge and Information Systems.

[21]  Reynold Xin,et al.  GraphX: Graph Processing in a Distributed Dataflow Framework , 2014, OSDI.

[22]  Kunle Olukotun,et al.  Efficient Parallel Graph Exploration on Multi-Core CPU and GPU , 2011, 2011 International Conference on Parallel Architectures and Compilation Techniques.

[23]  Haibo Chen,et al.  NUMA-aware graph-structured analytics , 2015, PPoPP.

[24]  Willy Zwaenepoel,et al.  X-Stream: edge-centric graph processing using streaming partitions , 2013, SOSP.

[25]  Guy E. Blelloch,et al.  Smaller and Faster: Parallel Processing of Compressed Graphs with Ligra+ , 2015, 2015 Data Compression Conference.

[26]  Guy E. Blelloch,et al.  GraphChi: Large-Scale Graph Computation on Just a PC , 2012, OSDI.

[27]  Charles E. Leiserson,et al.  Cache-Oblivious Algorithms , 2003, CIAC.