论文信息 - A Ring-Oriented Approach for Block Matrix Factorizations on Shared and Distributed Memory Architectures

A Ring-Oriented Approach for Block Matrix Factorizations on Shared and Distributed Memory Architectures

A block (column) wrap-mapping approach for design of parallel block matrix factorization algorithms that are (trans)portable over and between shared memory multiprocessors (SMM) and distributed memory multicomputers (DMM) is presented. By reorganizing the matrix on the SMM architecture, the same ring-oriented algorithms can be used on both SMM and DMM systems with all machine dependencies comprised to a small set of communication routines. The algorithms are described on high level with focus on portability and scalability aspects. Implementation aspects of the LU , Cholesky, and QR factorizations and machine specific communication routines for some SMM and DMM systems are discussed. Timing results show that our portable algorithms have similar performance as machine specific implementations. 1 Introduction With the introduction of advanced parallel computer architectures a demand for efficient and portable algorithms has emerged. Several attempts to design algorithms and implementat.

Erik Elmroth | Krister Dackland | Bo Kågström

[1] G. A. Geist,et al. PICL. Portable Instrumented Communication Library , 1990 .

[2] Erik Elmroth,et al. Ring-oriented block matrix factorization algorithms for shared and distributed memory architectures , 1992 .

[3] G. C. Fox,et al. Solving Problems on Concurrent Processors , 1988 .

[4] Michael Stumm,et al. Algorithms implementing distributed shared memory , 1990, Computer.

[5] Jack J. Dongarra,et al. A set of level 3 basic linear algebra subprograms , 1990, TOMS.

[6] Kai Li,et al. Shared virtual memory on loosely coupled multiprocessors , 1986 .

[7] Bo Kågström,et al. GEMM-Based Level-3 BLAS , 1991 .

[8] James H. Patterson,et al. Portable Programs for Parallel Processors , 1987 .

[9] K. A. Gallivan,et al. Parallel Algorithms for Dense Linear Algebra Computations , 1990, SIAM Rev..

[10] Erik Elmroth,et al. Parallel block matrix factorizations for distributed memory multicomputers , 1992 .

[11] Vijay P. Kumar,et al. Analyzing Scalability of Parallel Algorithms and Architectures , 1994, J. Parallel Distributed Comput..

[12] Erik Elmroth,et al. Parallel Block Matrix Factorizations on the Shared-Memory Multiprocessor Ibm 3090 VF/600J , 1992 .

[13] R. van de Geijn,et al. A look at scalable dense linear algebra libraries , 1992, Proceedings Scalable High Performance Computing Conference SHPCC-92..

[14] Robert E. Benner,et al. Development of Parallel Methods for a $1024$-Processor Hypercube , 1988 .