A Ring-Oriented Approach for Block Matrix Factorizations on Shared and Distributed Memory Architectures

A block (column) wrap-mapping approach for design of parallel block matrix factorization algorithms that are (trans)portable over and between shared memory multiprocessors (SMM) and distributed memory multicomputers (DMM) is presented. By reorganizing the matrix on the SMM architecture, the same ring-oriented algorithms can be used on both SMM and DMM systems with all machine dependencies comprised to a small set of communication routines. The algorithms are described on high level with focus on portability and scalability aspects. Implementation aspects of the LU , Cholesky, and QR factorizations and machine specific communication routines for some SMM and DMM systems are discussed. Timing results show that our portable algorithms have similar performance as machine specific implementations. 1 Introduction With the introduction of advanced parallel computer architectures a demand for efficient and portable algorithms has emerged. Several attempts to design algorithms and implementat.