Communication efficient matrix multiplication on hypercubes

In this paper we present an efficient dense matrix multiplication algorithm for distributed memory computers with a hypercube topology. The proposed algorithm performs better than all previously proposed algorithms for a wide range of matrix sizes and number of processors, especially for large matrices. We analyze the performance of the algorithms for two types of hypercube architectures, one in which each node can use (to send and receive) at most one communication link at a time and the other in which each node can use all communication links simultaneously.

[1]  Vipin Kumar,et al.  Scalability of Parallel Algorithms for Matrix Multiplication , 1993, 1993 International Conference on Parallel Processing - ICPP'93.

[2]  John N. Tsitsiklis,et al.  Parallel and distributed computation , 1989 .

[3]  S. Lennart Johnsson,et al.  Optimum Broadcasting and Personalized Communication in Hypercubes , 1989, IEEE Trans. Computers.

[4]  Geoffrey C. Fox,et al.  Matrix algorithms on a hypercube I: Matrix multiplication , 1987, Parallel Comput..

[5]  Lynn Elliot Cannon,et al.  A cellular computer to implement the kalman filter algorithm , 1969 .

[6]  P. Sadayappan,et al.  Communication-Efficient Matrix Multiplication on Hypercubes , 1996, Parallel Comput..

[7]  Ching-Tien Ho,et al.  Matrix Multiplication on Hypercubes Using Full Bandwith and Constant Storage , 1991, The Sixth Distributed Memory Computing Conference, 1991. Proceedings.

[8]  Sartaj Sahni,et al.  Parallel Matrix and Graph Algorithms , 1981, SIAM J. Comput..

[9]  James Demmel,et al.  Parallel numerical linear algebra , 1993, Acta Numerica.