A balanced submatrix merging algorithm for multiprocessor architectures
暂无分享,去创建一个
In this article we describe a parallel algorithm which applies Givens rotations to selectively annihilate k(k + 1)/2 nonzero elements from two k × n (kn) upper trapeziodal submatrices. The new algorithm we propose is suitable for implementation on either a pair of directly connected local-memory processors or two clusters of multiple tightly-coupled processors. Our analyses show that in both cases the proposed algorithms achieve optimal speed-up by balancing the work load distribution and masking inter-processor or inter-cluster communication by computation if k ⪡ n. In the context of solving large scale least squares problems [1,4], this submatrix merging step is repetitively needed during the entire computation and, furthermore, there are usually many pairs of such submatrices to be merged with each submatrix stored in the memory of a processor or a cluster of processors. The proposed algorithm can be applied to each pair of submatrices concurrently and thus parallelizes an important step in solving the least squares problems.
[1] G. Golub,et al. Large scale geodetic least squares adjustment by dissection and orthogonal decomposition , 1979 .
[2] G. Golub,et al. A comparison between some direct and iterative methods for certian large scale godetic least squares problems , 1986 .
[3] G. Golub,et al. Parallel block schemes for large-scale least-squares computations , 1988 .