A Study of Partitioning Policies for Graph Analytics on Large-scale Distributed Platforms

Distributed-memory clusters are used for in-memory processing of very large graphs with billions of nodes and edges. This requires partitioning the graph among the machines in the cluster. When a graph is partitioned, a node in the graph may be replicated on several machines, and communication is required to keep these replicas synchronized. Good partitioning policies attempt to reduce this synchronization overhead while keeping the computational load balanced across machines. A number of recent studies have looked at ways to control replication of nodes, but these studies are not conclusive because they were performed on small clusters with eight to sixteen machines, did not consider work-efficient data-driven algorithms, or did not optimize communication for the partitioning strategies they studied. This paper presents an experimental study of partitioning strategies for work-efficient graph analytics applications on large KNL and Skylake clusters with up to 256 machines using the Gluon communication runtime which implements partitioning-specific communication optimizations. Evaluation results show that although simple partitioning strategies like Edge-Cuts perform well on a small number of machines, an alternative partitioning strategy called Cartesian Vertex-Cut (CVC) performs better at scale even though paradoxically it has a higher replication factor and performs more communication than Edge-Cut partitioning does. Results from communication micro-benchmarks resolve this paradox by showing that communication overhead depends not only on communication volume but also on the communication pattern among the partitions. These experiments suggest that high-performance graph analytics systems should support multiple partitioning strategies, like Gluon does, as no single graph partitioning strategy is best for all cluster sizes. For such systems, a decision tree for selecting a good partitioning strategy based on characteristics of the computation and the cluster is presented.

[1]  Wei Li,et al.  Tux2: Distributed Graph Computation for Machine Learning , 2017, NSDI.

[2]  Rajiv Gupta,et al.  ASPIRE: exploiting asynchronous parallelism in iterative algorithms using a relaxed consistency based DSM , 2014, OOPSLA.

[3]  Vipin Kumar,et al.  A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs , 1998, SIAM J. Sci. Comput..

[4]  Monica S. Lam,et al.  SociaLite: Datalog extensions for efficient social network analysis , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[5]  Wenguang Chen,et al.  Gemini: A Computation-Centric Distributed Graph Processing System , 2016, OSDI.

[6]  Bora Uçar,et al.  On Two-Dimensional Sparse Matrix Partitioning: Models, Methods, and a Recipe , 2010, SIAM J. Sci. Comput..

[7]  Theodore L. Willke,et al.  GraphBuilder: scalable graph ETL framework , 2013, GRADES.

[8]  Vipin Kumar,et al.  Multilevel k-way hypergraph partitioning , 1999, DAC '99.

[9]  George Karypis,et al.  Multilevel algorithms for partitioning power-law graphs , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[10]  Sebastiano Vigna,et al.  The webgraph framework I: compression techniques , 2004, WWW '04.

[11]  Sungpack Hong,et al.  PGX.D: a fast distributed graph processing engine , 2015, SC15: International Conference for High Performance Computing, Networking, Storage and Analysis.

[12]  Christos Faloutsos,et al.  Kronecker Graphs: An Approach to Modeling Networks , 2008, J. Mach. Learn. Res..

[13]  John R. Gilbert,et al.  The Combinatorial BLAS: design, implementation, and applications , 2011, Int. J. High Perform. Comput. Appl..

[14]  Reynold Xin,et al.  GraphX: a resilient distributed graph system on Spark , 2013, GRADES.

[15]  Luke M. Leslie,et al.  An Experimental Comparison of Partitioning Strategies in Distributed Graph Processing , 2017, Proc. VLDB Endow..

[16]  Dhabaleswar K. Panda,et al.  Stampede 2: The Evolution of an XSEDE Supercomputer , 2017, PEARC.

[17]  Binyu Zang,et al.  PowerLyra: Differentiated Graph Computation and Partitioning on Skewed Graphs , 2019, TOPC.

[18]  Sivasankaran Rajamanickam,et al.  Partitioning Trillion-Edge Graphs in Minutes , 2016, 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[19]  Charalampos E. Tsourakakis,et al.  FENNEL: streaming graph partitioning for massive scale graphs , 2014, WSDM.

[20]  Vladimir Vlassov,et al.  Streaming Graph Partitioning: An Experimental Study , 2018, Proc. VLDB Endow..

[21]  Marco Rosa,et al.  Layered label propagation: a multiresolution coordinate-free ordering for compressing social networks , 2010, WWW.

[22]  Yi Su,et al.  Partitioning dynamic graph asynchronously with distributed FENNEL , 2017, Future Gener. Comput. Syst..

[23]  Vipin Kumar,et al.  Parallel Multilevel k-way Partitioning Scheme for Irregular Graphs , 1996, Proceedings of the 1996 ACM/IEEE Conference on Supercomputing.

[24]  Wencong Xiao,et al.  GraM: scaling graph computation to the trillions , 2015, SoCC.

[25]  Fabio Petroni,et al.  HDRF: Stream-Based Partitioning for Power-Law Graphs , 2015, CIKM.

[26]  Alex Brooks,et al.  Gluon: a communication-optimizing substrate for distributed heterogeneous graph analytics , 2018, PLDI.

[27]  Keshav Pingali,et al.  A lightweight infrastructure for graph analytics , 2013, SOSP.

[28]  Marc Lelarge,et al.  Balanced graph edge partition , 2014, KDD.

[29]  Keshav Pingali,et al.  Parallel graph analytics , 2016, Commun. ACM.

[30]  Guy E. Blelloch,et al.  Ligra: a lightweight graph processing framework for shared memory , 2013, PPoPP '13.

[31]  Jacob Nelson,et al.  Latency-Tolerant Software Distributed Shared Memory , 2015, USENIX ATC.

[32]  Avery Ching,et al.  One Trillion Edges: Graph Processing at Facebook-Scale , 2015, Proc. VLDB Endow..

[33]  Indranil Gupta,et al.  LFGraph: simple and fast distributed graph analytics , 2013, TRIOS@SOSP.

[34]  Sebastiano Vigna,et al.  Graph structure in the web --- revisited: a trick of the heavy tail , 2014, WWW.

[35]  Panos Kalnis,et al.  Mizan: a system for dynamic load balancing in large-scale graph processing , 2013, EuroSys '13.

[36]  Yu Zhang,et al.  VSEP: A Distributed Algorithm for Graph Edge Partitioning , 2015, ICA3PP.

[37]  Sivasankaran Rajamanickam,et al.  Scalable matrix computations on large scale-free graphs using 2D graph partitioning , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[38]  George Karypis,et al.  Introduction to Parallel Computing , 1994 .

[39]  Seif Haridi,et al.  State Management in Apache Flink®: Consistent Stateful Distributed Stream Processing , 2017, Proc. VLDB Endow..

[40]  Gabriel Kliot,et al.  Streaming graph partitioning for large distributed graphs , 2012, KDD.

[41]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[42]  Joseph Gonzalez,et al.  PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs , 2012, OSDI.

[43]  K. Selçuk Candan,et al.  SBV-Cut: Vertex-cut based graph partitioning using structural balance vertices , 2012, Data Knowl. Eng..

[44]  Alex Brooks,et al.  A Lightweight Communication Runtime for Distributed Graph Analytics , 2018, 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[45]  Reena Panda,et al.  Data partitioning strategies for graph workloads on heterogeneous clusters , 2015, SC15: International Conference for High Performance Computing, Networking, Storage and Analysis.

[46]  Dan Meng,et al.  An evaluation and analysis of graph processing frameworks on five key issues , 2015, Conf. Computing Frontiers.