Is Scalability Relevant? A Look at Sparse Matrix-Vector Product
暂无分享,去创建一个
This paper considers the scalability of four parallel algorithms for computing y Ax where A is a sparse N N matrix with k 1 nonzeros per row and column, and x and y are dense vectors of length N. Rather than giving insight into which algorithm is most useful, the study highlights limitations of scalability analysis. In each algorithm, P processors each own one P-th of x, y, and A. There is a round of communication in which each processor acquires the subset of x conforming to its submatrix, a local sparse matrix-vector product, and a second round of communication and additions in which each processor accumulates its segment of y. Algorithms are called 1D or 2D depending on whether A is partitioned into P strips of width N=P or into a P 1 2 P 1 2 grid of submatrices. In the basic 1D algorithm, each processor sends N values in P messages; 1 in the 2D algorithm, 2N=P 1 2 values are sent in 2P 1 2 messages. The basic algorithms can be improved by applying one of two message compression techniques: the rst reduces the number of messages, while the second minimizes the number of values communicated. Hypercube algorithms LvdG93] employ recursive doubling 2 in the rst round of communication and recursive halving in the second to reduce the number of messages per processor to log 2 P. The number of values sent remains the same. Reticent algorithms suppress the transmission of values and messages that are not needed. 3 The expected 4 communication costs of reticent algorithms are approximated by the Poisson distribution as follows. In the 1D algorithm, each processor sends about N(1?e ?k=P) values in about P (1?e ?Nk=P 2) messages. In the 2D case, the number of values is about 2NP ? 1 2 (1?e ?k=P 1 2) and the number of messages is about 2P 1 2 (1?e ?Nk=P 3 2). If the A matrix is very sparse and P is moderately large, the reticent algorithms send substantially less data, though approximately the same number of messages, as the basic algorithms. An algorithm is scalable if the parallel eeciency (the ratio of sequential runtime to P times parallel time) remains bounded away from zero as P increases arbitrarily. 5 Letting the number of nonzeros that each processor owns be a constant (call it M) xes the 1 To simplify analysis, we assume …