Scalability of Parallel Algorithms for Matrix Multiplication

A number of parallel formulations of dense matrix multiplication algorithm have been developed. For arbitrarily large number of processors, any of these algorithms or their variants can provide near linear speedup for sufficiently large matrix sizes and none of the algorithms can be clearly claimed to be superior than the others. In this paper we analyze the performance and scalability of a number of parallel formulations of the matrix multiplication algorithm and predict the conditions under which each formulation is better than the others.

[1]  Jarle Berntsen,et al.  Communication efficient matrix multiplication on hypercubes , 1989, Parallel Comput..

[2]  Patrick H. Worley,et al.  The Effect of Time Constraints on Scaled Speedup , 1990, SIAM J. Sci. Comput..

[3]  Ching-Tien Ho,et al.  Matrix Multiplication on Hypercubes Using Full Bandwith and Constant Storage , 1991, The Sixth Distributed Memory Computing Conference, 1991. Proceedings.

[4]  Charles E. Leiserson,et al.  Fat-trees: Universal networks for hardware-efficient supercomputing , 1985, IEEE Transactions on Computers.

[5]  P. C. Messina Emerging supercomputer architectures , 1987 .

[6]  Sartaj Sahni,et al.  Parallel Matrix and Graph Algorithms , 1981, SIAM J. Comput..

[7]  Kai Hwang,et al.  Advanced computer architecture - parallelism, scalability, programmability , 1992 .

[8]  Sartaj Sahni,et al.  A Hypercube Algorithm for the 0/1 Knapsack Problem , 1988, J. Parallel Distributed Comput..

[9]  Vipin Kumar,et al.  Scalability of Parallel Algorithms for the All-Pairs Shortest-Path Problem , 1991, J. Parallel Distributed Comput..

[10]  George Karypis,et al.  Introduction to Parallel Computing , 1994 .

[11]  Vipin Kumar,et al.  Scalability of parallel sorting on mesh multicomputers , 1991, [1991] Proceedings. The Fifth International Parallel Processing Symposium.

[12]  D. Curkendall,et al.  The JPL/Caltech Mark IIIfp hypercube , 1988, C3P.

[13]  V. Nageshwara Rao,et al.  Scalable parallel formulations of depth-first search , 1990 .

[14]  Vipin Kumar,et al.  The Scalability of FFT on Parallel Computers , 1993, IEEE Trans. Parallel Distributed Syst..

[15]  Robert E. Benner,et al.  Development of Parallel Methods for a $1024$-Processor Hypercube , 1988 .

[16]  Lynn Elliot Cannon,et al.  A cellular computer to implement the kalman filter algorithm , 1969 .

[17]  Vijay P. Kumar,et al.  Analyzing Scalability of Parallel Algorithms and Architectures , 1994, J. Parallel Distributed Comput..

[18]  Walter F Tichy Parallel Matrix Multiplication on the Connection Machine , 1989, Int. J. High Speed Comput..

[19]  S. Lennart Johnsson,et al.  Optimum Broadcasting and Personalized Communication in Hypercubes , 1989, IEEE Trans. Computers.

[20]  G. C. Fox,et al.  Solving Problems on Concurrent Processors , 1988 .

[21]  Selim G. Akl,et al.  Design and analysis of parallel algorithms , 1985 .

[22]  G. R. Withers,et al.  Computing performance as a function of the speed, quantity, and cost of the processors , 1989, Proceedings of the 1989 ACM/IEEE Conference on Supercomputing (Supercomputing '89).

[23]  John N. Tsitsiklis,et al.  Parallel and distributed computation , 1989 .

[24]  Vipin Kumar,et al.  Scalable Load Balancing Techniques for Parallel Computers , 1994, J. Parallel Distributed Comput..

[25]  Geoffrey C. Fox,et al.  Matrix algorithms on a hypercube I: Matrix multiplication , 1987, Parallel Comput..

[26]  Vipin Kumar,et al.  Performance and Scalability of Preconditioned Conjugate Gradient Methods on Parallel Computers , 1995, IEEE Trans. Parallel Distributed Syst..

[27]  Sartaj Sahni,et al.  Hypercube algorithms for image processing and pattern recognition , 1990 .