Communication balancing in parallel sparse matrix-vector multiplication

Given a partitioning of a sparse matrix for parallel matrix–vector multiplication, which determines the total communication volume, we try to find a suitable vector partitioning that balances the communication load among the processors. We present a new lower bound for the maximum communication cost per processor, an optimal algorithm that attains this bound for the special case where each matrix column is owned by at most two processors, and a new heuristic algorithm for the general case that often attains the lower bound. This heuristic algorithm tries to avoid raising the current lower bound when assigning vector components to processors. Experimental results show that the new algorithm often improves upon the heuristic algorithm that is currently implemented in the sparse matrix partitioning package Mondriaan. Trying both heuristics combined with a greedy improvement procedure solves the problem optimally in most practical cases. The vector partitioning problem is proven to be NP-complete.

[1]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[2]  John G. Lewis,et al.  Sparse matrix test problems , 1982, SGNM.

[3]  R. M. Mattheyses,et al.  A Linear-Time Heuristic for Improving Network Partitions , 1982, 19th Design Automation Conference.

[4]  Leslie G. Valiant,et al.  A bridging model for parallel computation , 1990, CACM.

[5]  Richard Barrett,et al.  Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods , 1994, Other Titles in Applied Mathematics.

[6]  Ümit V. Çatalyürek,et al.  Decomposing Irregularly Sparse Matrices for Parallel Matrix-Vector Multiplication , 1996, IRREGULAR.

[7]  Iain S. Duff,et al.  The Rutherford-Boeing sparse matrix collection , 1997 .

[8]  Vipin Kumar,et al.  Multilevel k-way hypergraph partitioning , 1999, DAC '99.

[9]  Ümit V. Çatalyürek,et al.  Hypergraph-Partitioning-Based Decomposition for Parallel Sparse-Matrix Vector Multiplication , 1999, IEEE Trans. Parallel Distributed Syst..

[10]  Andrew B. Kahng,et al.  Improved algorithms for hypergraph bipartitioning , 2000, ASP-DAC '00.

[11]  R. J. Blake,et al.  A multilevel unsymmetric matrix ordering algorithm for parallel process simulation , 2000 .

[12]  R. Bisseling,et al.  DNA Electrophoresis Studied with the Cage Model , 2001, cond-mat/0101467.

[13]  Ümit V. Çatalyürek,et al.  A fine-grain hypergraph model for 2D decomposition of sparse matrices , 2001, Proceedings 15th International Parallel and Distributed Processing Symposium. IPDPS 2001.

[14]  Ümit V. Çatalyürek,et al.  A Hypergraph-Partitioning Approach for Coarse-Grain Decomposition , 2001, ACM/IEEE SC 2001 Conference (SC'01).

[15]  Courtenay T. Vaughan,et al.  Zoltan data management services for parallel dynamic applications , 2002, Comput. Sci. Eng..

[16]  Bruce Hendrickson,et al.  Exploiting flexibly assignable work to improve load balance , 2002, SPAA '02.

[17]  George Karypis,et al.  Multilevel Hypergraph Partitioning , 2003 .

[18]  William J. Knottenbelt,et al.  A parallel algorithm for multilevel k-way hypergraph partitioning , 2004, Third International Symposium on Parallel and Distributed Computing/Third International Workshop on Algorithms, Models and Tools for Parallel Computing on Heterogeneous Networks.

[19]  Bora Uçar,et al.  Encapsulating Multiple Communication-Cost Metrics in Partitioning Sparse Rectangular Matrices for Parallel Matrix-Vector Multiplies , 2004, SIAM J. Sci. Comput..

[20]  Bruce Hendrickson,et al.  LDRD report : parallel repartitioning for optimal solver performance. , 2004 .

[21]  Brendan Vastenhouw,et al.  A Two-Dimensional Data Distribution Method for Parallel Sparse Matrix-Vector Multiplication , 2005, SIAM Rev..