论文信息 - A Simple Yet Effective Balanced Edge Partition Model for Parallel Computing

A Simple Yet Effective Balanced Edge Partition Model for Parallel Computing

Graph edge partition models have recently become an appealing alternative to graph vertex partition models for distributed computing due to both their flexibility in balancing loads and their performance in reducing communication cost. In this paper, we propose a simple yet effective graph edge partitioning algorithm. In practice, our algorithm provides good partition quality while maintaining low partition overhead. It also outperforms similar state-of-the-art edge partition approaches, especially for power-law graphs. In theory, previous work showed that an approximation guarantee of O(dmax√(log n log k)) apply to the graphs with m=Ω(k2) edges (n is the number of vertices, and k is the number of partitions). We further rigorously proved that this approximation guarantee hold for all graphs. We also demonstrate the applicability of the proposed edge partition algorithm in real parallel computing systems. We draw our example from GPU program locality enhancement and demonstrate that the graph edge partition model does not only apply to distributed computing with many computer nodes, but also to parallel computing in a single computer node with a many-core processor.

[1] Charalampos E. Tsourakakis,et al. FENNEL: streaming graph partitioning for massive scale graphs , 2014, WSDM.

[2] Bo Wu,et al. Complexity analysis and algorithm design for reorganizing data to minimize non-coalesced memory accesses on GPU , 2013, PPoPP '13.

[3] Timothy A. Davis,et al. The university of Florida sparse matrix collection , 2011, TOMS.

[4] Robert Krauthgamer,et al. Partitioning graphs into balanced components , 2009, SODA.

[5] Troels Blum,et al. Fusion of parallel array operations , 2016, 2016 International Conference on Parallel Architecture and Compilation Techniques (PACT).

[6] Ken Kennedy,et al. Improving cache performance in dynamic applications through data and computation reorganization at run time , 1999, PLDI '99.

[7] Marc Lelarge,et al. Balanced graph edge partition , 2014, KDD.

[8] Uday Bondhugula,et al. A practical automatic polyhedral parallelizer and locality optimizer , 2008, PLDI '08.

[9] Tamara G. Kolda,et al. Partitioning Rectangular and Structurally Unsymmetric Sparse Matrices for Parallel Processing , 1999, SIAM J. Sci. Comput..

[10] Ken Kennedy,et al. Improving effective bandwidth through compiler enhancement of global cache reuse , 2004, J. Parallel Distributed Comput..

[11] Richard F. Barrett,et al. Matrix Market: a web resource for test matrix collections , 1996, Quality of Numerical Software.

[12] Carlos Guestrin,et al. Distributed GraphLab : A Framework for Machine Learning and Data Mining in the Cloud , 2012 .

[13] Xipeng Shen,et al. On-the-fly elimination of dynamic irregularities for GPU computing , 2011, ASPLOS XVI.

[14] Albert,et al. Emergence of scaling in random networks , 1999, Science.

[15] Olcay Polat,et al. A parallel variable neighborhood search for the vehicle routing problem with divisible deliveries and pickups , 2017, Comput. Oper. Res..

[16] M. Hestenes,et al. Methods of conjugate gradients for solving linear systems , 1952 .

[17] Michael Garland,et al. Eﬃcient Sparse Matrix-Vector Multiplication on CUDA , 2008 .