Distributed Orthogonal Factorization

Several algorithms for orthogonal factorization on distributed memory multiprocessors are designed and implemented. Two of the algorithms employ Householder transformations, a third is based on Givens rotations, and a fourth hybrid algorithm uses Householder transformations and Givens rotations in different phases.The arithmetic and communication complexities of the algorithms are analyzed. The analyses show that the sequential arithmetic terms play a more important role than the communication terms in determining the running times and efficiencies of these algorithms. The hybrid algorithm is the fastest algorithm overall, since its arithmetic cost is lower than the Householder algorithms and its communication cost does not increase with the column length of the matrix. The observed execution times of the implementations on an iPSC-286 agree quite well with the complexity analyses. It is also shown that the efficiencies can be approximated using only the arithmetic costs of the algorithms.