论文信息 - A general scalable implementation of fast matrix multiplication algorithms on distributed memory computers

A general scalable implementation of fast matrix multiplication algorithms on distributed memory computers

Fast matrix multiplication (FMM) algorithms to multiply two n /spl times/ n matrices reduce the asymptotic operation count from O(n/sup 3/) of the traditional algorithm to O(n/sup 2.38/), thus on distributed memory computers, the association of FMM algorithms and the parallel matrix multiplication algorithms always gives remarkable results. Within this association, the application of FMM algorithms at inter-processor level requires us to solve more difficult problems in designing but it forms the most effective algorithms. In this paper, a general model of these algorithms is presented and we also introduce a scalable method to implement this model on distributed memory computers.

[1] P. Sadayappan,et al. A tensor product formulation of Strassen's matrix multiplication algorithm with memory reduction , 1993, [1993] Proceedings Seventh International Parallel Processing Symposium.

[2] Niklaus Wirth,et al. Algorithms + Data Structures = Programs , 1976 .

[3] R. W. Johnson,et al. A tensor product formulation of Strassen's matrix multiplication algorithm with memory reduction , 1993, [1993] Proceedings Seventh International Parallel Processing Symposium.

[4] Lynn Elliot Cannon,et al. A cellular computer to implement the kalman filter algorithm , 1969 .

[5] V. Pan. How can we speed up matrix multiplication , 1984 .

[6] Don Coppersmith,et al. Matrix multiplication via arithmetic progressions , 1987, STOC.

[7] Yuefan Deng,et al. Parallelizing Strassen's method for matrix multiplication on distributed-memory MIMD architectures☆ , 1995 .

[8] Gene H. Golub,et al. Matrix computations , 1983 .

[9] Robert A. van de Geijn,et al. SUMMA: scalable universal matrix multiplication algorithm , 1995, Concurr. Pract. Exp..

[10] V. Strassen. Gaussian elimination is not optimal , 1969 .

[11] Qingshan Luo,et al. A scalable parallel Strassen's matrix multiplication algorithm for distributed-memory computers , 1995, SAC '95.

[12] Geoffrey C. Fox,et al. Matrix algorithms on a hypercube I: Matrix multiplication , 1987, Parallel Comput..

[13] Robert A. van de Geijn,et al. A High Performance Parallel Strassen Implementation , 1995, Parallel Process. Lett..

[14] J. Choi,et al. A fast scalable universal matrix multiplication algorithm on distributed-memory concurrent computers , 1997, Proceedings 11th International Parallel Processing Symposium.

[15] Julian D. Laderman,et al. On practical algorithms for accelerated matrix multiplication , 1992 .

[16] Jaeyoung Choi,et al. Pumma: Parallel universal matrix multiplication algorithms on distributed memory concurrent computers , 1994, Concurr. Pract. Exp..

[17] Guodong Zhang,et al. Matrix multiplication on the Intel Touchstone Delta , 1994, Concurr. Pract. Exp..

[18] Shmuel Winograd,et al. On multiplication of 2 × 2 matrices , 1971 .