论文信息 - A Block QR Factorization Scheme for Loosely Coupled Systems of Array Processors

A Block QR Factorization Scheme for Loosely Coupled Systems of Array Processors

A statically scheduled parallel block QR factorization procedure is described. It is based on "block" Givens rotations and is modeled after the Gentleman-Kung systolic QR procedure. Independent tasks are associated with each block column. "Tallest possible" subproblems are always solved. The method has been implemented on the IBM Kingston LCAP-1 system, which consists of ten FPS-164/MAX array processors that can communicate through a large shared bulk memory. The implementation revealed much about the tradeoff between block size and load balancing. Large blocks make load balancing more difficult but give better 164/MAX performance and less shared memory traffic. The results obtained indicate that our approach to parallelizing the QR factorization is competitive for very large problems, e.g., of the order 5000-by-1000.

Charles Van Loan | C. Loan

[1] Jack J. Dongarra,et al. Implementation of some concurrent algorithms for matrix factorization , 1986, Parallel Comput..

[2] H. T. Kung,et al. Matrix Triangularization By Systolic Arrays , 1982, Optics & Photonics.

[3] Ilse C. F. Ipsen,et al. Systolic Networks for Orthogonal Decompositions , 1983 .

[4] Gene H. Golub,et al. Matrix computations , 1983 .

[5] Christian H. Bischof,et al. The WY representation for products of householder matrices , 1985, PPSC.

[6] John L. Gustafson,et al. Introducing Replicated VLSI to Supercomputing: the FPS-164/MAX Scientific Computer , 1986, Computer.

[7] J. Liu,et al. Parallel Cholesky factorization on a multiprocessor , 1985 .