A general scalable implementation of fast matrix multiplication algorithms on distributed memory computers

Fast matrix multiplication (FMM) algorithms to multiply two n /spl times/ n matrices reduce the asymptotic operation count from O(n/sup 3/) of the traditional algorithm to O(n/sup 2.38/), thus on distributed memory computers, the association of FMM algorithms and the parallel matrix multiplication algorithms always gives remarkable results. Within this association, the application of FMM algorithms at inter-processor level requires us to solve more difficult problems in designing but it forms the most effective algorithms. In this paper, a general model of these algorithms is presented and we also introduce a scalable method to implement this model on distributed memory computers.

[1]  P. Sadayappan,et al.  A tensor product formulation of Strassen's matrix multiplication algorithm with memory reduction , 1993, [1993] Proceedings Seventh International Parallel Processing Symposium.

[2]  Niklaus Wirth,et al.  Algorithms + Data Structures = Programs , 1976 .

[3]  R. W. Johnson,et al.  A tensor product formulation of Strassen's matrix multiplication algorithm with memory reduction , 1993, [1993] Proceedings Seventh International Parallel Processing Symposium.

[4]  Lynn Elliot Cannon,et al.  A cellular computer to implement the kalman filter algorithm , 1969 .

[5]  V. Pan How can we speed up matrix multiplication , 1984 .

[6]  Don Coppersmith,et al.  Matrix multiplication via arithmetic progressions , 1987, STOC.

[7]  Yuefan Deng,et al.  Parallelizing Strassen's method for matrix multiplication on distributed-memory MIMD architectures☆ , 1995 .

[8]  Gene H. Golub,et al.  Matrix computations , 1983 .

[9]  Robert A. van de Geijn,et al.  SUMMA: scalable universal matrix multiplication algorithm , 1995, Concurr. Pract. Exp..

[10]  V. Strassen Gaussian elimination is not optimal , 1969 .

[11]  Qingshan Luo,et al.  A scalable parallel Strassen's matrix multiplication algorithm for distributed-memory computers , 1995, SAC '95.

[12]  Geoffrey C. Fox,et al.  Matrix algorithms on a hypercube I: Matrix multiplication , 1987, Parallel Comput..

[13]  Robert A. van de Geijn,et al.  A High Performance Parallel Strassen Implementation , 1995, Parallel Process. Lett..

[14]  J. Choi,et al.  A fast scalable universal matrix multiplication algorithm on distributed-memory concurrent computers , 1997, Proceedings 11th International Parallel Processing Symposium.

[15]  Julian D. Laderman,et al.  On practical algorithms for accelerated matrix multiplication , 1992 .

[16]  Jaeyoung Choi,et al.  Pumma: Parallel universal matrix multiplication algorithms on distributed memory concurrent computers , 1994, Concurr. Pract. Exp..

[17]  Guodong Zhang,et al.  Matrix multiplication on the Intel Touchstone Delta , 1994, Concurr. Pract. Exp..

[18]  Shmuel Winograd,et al.  On multiplication of 2 × 2 matrices , 1971 .