Semi-two-dimensional Partitioning for Parallel Sparse Matrix-Vector Multiplication

We propose a novel sparse matrix partitioning scheme, called semi-two-dimensional (s2D), for efficient parallelization of sparse matrix-vector multiply (SpMV) operations on distributed memory systems. In s2D, matrix nonzeros are more flexibly distributed among processors than one dimensional (row wise or column wise) partitioning schemes. Yet, there is a constraint which renders s2D less flexible than two-dimensional (nonzero based) partitioning schemes. The constraint is enforced to confine all communication operations in a single phase, as in 1D partition, in a parallel SpMV operation. In a positive view, s2D thus can be seen as being close to 2D partitions in terms of flexibility, and being close 1D partitions in terms of computation/communication organization. We describe two methods that take partitions on the input and output vectors of SpMV and produce s2D partitions while reducing the total communication volume. The first method obtains optimal total communication volume, while the second one heuristically reduces this quantity and takes computational load balance into account. We demonstrate that the proposed partitioning method improves the performance of parallel SpMV operations both in theory and practice with respect to 1D and 2D partitionings.

[1]  Ümit V. Çatalyürek,et al.  Hypergraph-Partitioning-Based Decomposition for Parallel Sparse-Matrix Vector Multiplication , 1999, IEEE Trans. Parallel Distributed Syst..

[2]  Bora Uçar,et al.  On Two-Dimensional Sparse Matrix Partitioning: Models, Methods, and a Recipe , 2010, SIAM J. Sci. Comput..

[3]  Rob H. Bisseling,et al.  Parallel Scientific Computation , 2004 .

[4]  Andy B. Yoo,et al.  A scalable eigensolver for large scale-free graphs using 2D graph partitioning , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[5]  Ümit V. Çatalyürek,et al.  A Hypergraph-Partitioning Approach for Coarse-Grain Decomposition , 2001, ACM/IEEE SC 2001 Conference (SC'01).

[6]  Bora Uçar,et al.  On analysis of partitioning models and metrics in parallel sparse matrix-vector multiplication , 2013 .

[7]  Edmond Chow,et al.  A Scalable Distributed Parallel Breadth-First Search Algorithm on BlueGene/L , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[8]  Reynold Xin,et al.  GraphX: a resilient distributed graph system on Spark , 2013, GRADES.

[9]  Rob H. Bisseling,et al.  A Medium-Grain Method for Fast 2D Bipartitioning of Sparse Matrices , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.

[10]  Ami Marowka,et al.  Parallel Scientific Computation: A Structured Approach using BSP and MPI , 2006, Scalable Comput. Pract. Exp..

[11]  Rok Sosic,et al.  SNAP , 2016, ACM Trans. Intell. Syst. Technol..

[12]  Sivasankaran Rajamanickam,et al.  Scalable matrix computations on large scale-free graphs using 2D graph partitioning , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[13]  Timothy A. Davis,et al.  The university of Florida sparse matrix collection , 2011, TOMS.

[14]  Alex Pothen,et al.  Computing the block triangular form of a sparse matrix , 1990, TOMS.

[15]  Ümit V. Çatalyürek,et al.  A fine-grain hypergraph model for 2D decomposition of sparse matrices , 2001, Proceedings 15th International Parallel and Distributed Processing Symposium. IPDPS 2001.

[16]  Panayot S. Vassilevski,et al.  Improving the Communication Pattern in Matrix-Vector Operations for Large Scale-Free Graphs by Disaggregation , 2013, SIAM J. Sci. Comput..

[17]  Brendan Vastenhouw,et al.  A Two-Dimensional Data Distribution Method for Parallel Sparse Matrix-Vector Multiplication , 2005, SIAM Rev..

[18]  Bora Uçar,et al.  Revisiting Hypergraph Models for Sparse Matrix Partitioning , 2007, SIAM Rev..

[19]  Christos Faloutsos,et al.  R-MAT: A Recursive Model for Graph Mining , 2004, SDM.

[20]  Cevdet Aykanat,et al.  A Novel Method for Scaling Iterative Solvers: Avoiding Latency Overhead of Parallel Sparse-Matrix Vector Multiplies , 2015, IEEE Transactions on Parallel and Distributed Systems.

[21]  N. S. Mendelsohn,et al.  Coverings of Bipartite Graphs , 1958, Canadian Journal of Mathematics.

[22]  George Karypis,et al.  Multilevel Hypergraph Partitioning , 2003 .