论文信息 - Parallel matrix transpose algorithms on distributed memory concurrent computers

Parallel matrix transpose algorithms on distributed memory concurrent computers

This paper describes parallel matrix transpose algorithms on distributed memory concurrent processors. We assume that the matrix is distributed over a P/spl times/Q processor template with a block scattered data distribution. P, Q, and the block size can be arbitrary, so the algorithms have wide applicability. The algorithms make use of non-blocking, point-to-point communication between processors. The use of nonblocking communication allows a processor to overlap the messages that it sends to different processors, thereby avoiding unnecessary synchronization. Combined with the matrix multiplication routine, C=A/spl middot/B, the algorithms are used to compute parallel multiplications of transposed matrices, C=A/sup T//spl middot/B/sup T/, in the PUMMA package. Details of the parallel implementation of the algorithms are given, and results are presented for runs on the Intel Touchstone Delta computer.<<ETX>>

[1] Jack Dongarra,et al. ScaLAPACK: a scalable linear algebra library for distributed memory concurrent computers , 1992, [Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation.

[2] George E. Forsythe,et al. Computer science and mathematics , 1970, SGCS.

[3] Shahid H. Bokhari,et al. Complete exchange on a circuit switched mesh , 1992, Proceedings Scalable High Performance Computing Conference SHPCC-92..

[4] B. Buslee. Supercomputers: Value and Trends Bill Buzbee, Computer Research and Applications Group, Computing and Communications Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545 , 1987 .

[5] J. O. Eklundh,et al. A Fast Computer Method for Matrix Transposing , 1972, IEEE Transactions on Computers.

[6] Jaeyoung Choi,et al. Pumma: Parallel universal matrix multiplication algorithms on distributed memory concurrent computers , 1994, Concurr. Pract. Exp..

[7] Soo-Young Lee,et al. Synchronous And Asynchronous Algorithms For Matrix Transposition On MCAP , 1988, Optics & Photonics.

[8] R. van de Geijn,et al. A look at scalable dense linear algebra libraries , 1992, Proceedings Scalable High Performance Computing Conference SHPCC-92..

[9] Dianne P. O'Leary,et al. Systolic Arrays for Matrix Transpose and Other Reorderings , 1987, IEEE Transactions on Computers.

[10] Jaeyoung Choi,et al. The design of scalable software libraries for distributed memory concurrent computers , 1994, Proceedings of 8th International Parallel Processing Symposium.

[11] Peter D. Lax,et al. Almost Periodic Behavior of Nonlinear Waves**Results obtained at the Courant Institute of Mathematical Sciences, New York University, under Contract AT(11–1)-3077 with the U.S. Atomic Energy Commission. , 1976 .

[12] S. Lennart Johnsson,et al. Algorithms for Matrix Transposition on Boolean n-Cube Configured Ensemble Architectures , 1988, ICPP.