论文信息 - QR factorization of a dense matrix on a shared-memory multiprocessor

QR factorization of a dense matrix on a shared-memory multiprocessor

A new algorithm for computing an orthogonal decomposition of a rectangular m × n matrix A on a shared-memory parallel computer is described. The algorithm uses Givens rotations, and has the feature that its synchronization cost is low. In particular, for a multiprocessor having p processors, an analysis of the algorithm shows that this cost is O(n2/p) if m/p ⪰ n, and O(mn/p2) of m/p <. Note that in the latter case, the synchronization cost is smaller than O(n2/p). Therefore, the synchronization cost of the algorithm proposed in this article is bounded by O(n2/p) when m ⪰ n. This is important for machines where synchronization cost is high, and when m⪢n. Analysis and experiments show that the algorithm is effective in balancing the load and producing high efficiency (speedup).

Alan George | Eleanor Chu | A. George | E. Chu

[1] Jean-Marc Delosme,et al. Highly concurrent computing structures for matrix arithmetic and signal processing , 1982, Computer.

[2] S. P. Kumar,et al. Solving Linear Algebraic Equations on an MIMD Computer , 1983, JACM.

[3] M. Cosnard,et al. Parallel QR decomposition of a rectangular matrix , 1986 .

[4] David J. Kuck,et al. On Stable Parallel Linear System Solvers , 1978, JACM.

[5] Franklin T. Luk,et al. A Rotation Method for Computing the QR-Decomposition , 1986 .

[6] J. J. Modi,et al. An alternative givens ordering , 1984 .

[7] Yves Robert,et al. Complexité de la factorisation QR en parallèle , 1982 .

[8] Ahmed Sameh,et al. On some parallel algorithms on a ring of processors , 1985 .

[9] H. T. Kung,et al. Numerically Stable Solution of Dense Systems of Linear Equations Using Mesh-Connected Processors , 1984 .

[10] Jack J. Dongarra,et al. Implementation of some concurrent algorithms for matrix factorization , 1986, Parallel Comput..

[11] H. T. Kung,et al. Matrix Triangularization By Systolic Arrays , 1982, Optics & Photonics.