Efficiency and scalability of two parallel QR factorization algorithms

Both the Householder QR factorization algorithm and the modified Gram-Schmidt algorithm can be written in terms of matrix-matrix operations using the Compact WY representation. Parallelizations of the resulting algorithms are reviewed and analyzed. For this purpose a general framework for analyzing the scalability of parallel algorithms is presented.<<ETX>>

[1]  R. A. van de Geijn,et al.  Efficient Global Combine Operations , 1991 .

[2]  Christopher C. Paige,et al.  Loss and Recapture of Orthogonality in the Modified Gram-Schmidt Algorithm , 1992, SIAM J. Matrix Anal. Appl..

[3]  Dianne P. O'Leary,et al.  Parallel QR factorization by Householder and modified Gram-Schmidt algorithms , 1990, Parallel Comput..

[4]  Alan H. Karp,et al.  Measuring parallel processor performance , 1990, CACM.

[5]  Anoop Gupta,et al.  Scaling parallel programs for multiprocessors: methodology and examples , 1993, Computer.

[6]  K. A. Gallivan,et al.  Parallel Algorithms for Dense Linear Algebra Computations , 1990, SIAM Rev..

[7]  Anant Agarwal,et al.  Scalability of parallel machines , 1991, CACM.

[8]  Robert A. van de Geijn,et al.  Optimal Broadcasting in Mesh-Connected Architectures , 1991 .

[9]  S. Lennart Johnsson,et al.  Distributed Routing Algorithms for Broadcasting and Personalized Communication in Hypercubes , 1986, ICPP.

[10]  S. Lennart Johnsson,et al.  Optimum Broadcasting and Personalized Communication in Hypercubes , 1989, IEEE Trans. Computers.

[11]  Robert A. van de Geijn,et al.  Scalability Issues Affecting the Design of a Dense Linear Algebra Library , 1994, J. Parallel Distributed Comput..

[12]  Joël M. Malard,et al.  Data Replication in Dense Matrix Factorization , 1993, Parallel Process. Lett..

[13]  Xian-He Sun,et al.  Toward a better parallel performance metric , 1991, Parallel Comput..

[14]  Vipin Kumar,et al.  The Scalability of FFT on Parallel Computers , 1993, IEEE Trans. Parallel Distributed Syst..

[15]  John L. Gustafson,et al.  Reevaluating Amdahl's law , 1988, CACM.

[16]  Patrick H. Worley,et al.  The Effect of Time Constraints on Scaled Speedup , 1990, SIAM J. Sci. Comput..

[17]  Jack Dongarra,et al.  ScaLAPACK: a scalable linear algebra library for distributed memory concurrent computers , 1992, [Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation.

[18]  Eric F. van de Velde,et al.  Experiments with Multicomputer LU-decomposition , 1990, Concurr. Pract. Exp..

[19]  C. Loan,et al.  A Storage-Efficient $WY$ Representation for Products of Householder Transformations , 1989 .

[20]  Å. Björck Numerics of Gram-Schmidt orthogonalization , 1994 .