Parallel implementation of Strassen's matrix multiplication algorithm for heterogeneous clusters

Summary form only given. We propose a new distribution scheme for a parallel Strassen's matrix multiplication algorithm on heterogeneous clusters. In the heterogeneous clustering environment, appropriate data distribution is the most important factor for achieving maximum overall performance. However, Strassen's algorithm reduces the total operation count to about 7/8 times per one recursion and, hence, the recursion level has an effect on the total operation count. Thus, we need to consider not only load balancing but also the recursion level in Strassen's algorithm. Our scheme achieves both load balancing and reduction of the total operation count. As a result, we achieve a speedup of nearly 21.7% compared to the conventional parallel Strassen's algorithm in a heterogeneous clustering environment.

[1]  Ramesh C. Agarwal,et al.  A high-performance matrix-multiplication algorithm on a distributed-memory parallel computer, using overlapped communication , 1994, IBM J. Res. Dev..

[2]  Yves Robert,et al.  Matrix Multiplication on Heterogeneous Platforms , 2001, IEEE Trans. Parallel Distributed Syst..

[3]  Frédéric Suter,et al.  Mixed parallel implementations of the top level step of Strassen and Winograd matrix multiplication algorithms , 2001, Proceedings 15th International Parallel and Distributed Processing Symposium. IPDPS 2001.

[4]  Robert A. van de Geijn,et al.  A High Performance Parallel Strassen Implementation , 1995, Parallel Process. Lett..

[5]  Shmuel Winograd,et al.  On multiplication of 2 × 2 matrices , 1971 .

[6]  V. Strassen Gaussian elimination is not optimal , 1969 .

[7]  Qingshan Luo,et al.  A scalable parallel Strassen's matrix multiplication algorithm for distributed-memory computers , 1995, SAC '95.

[8]  Geoffrey C. Fox,et al.  Matrix algorithms on a hypercube I: Matrix multiplication , 1987, Parallel Comput..

[9]  Alexey Kalinov,et al.  Natural block data decomposition for heterogeneous clusters , 2003, Proceedings International Parallel and Distributed Processing Symposium.

[10]  Alexey L. Lastovetsky,et al.  Heterogeneous Distribution of Computations Solving Linear Algebra Problems on Networks of Heterogeneous Computers , 2001, J. Parallel Distributed Comput..

[11]  Yves Robert,et al.  A Proposal for a Heterogeneous Cluster ScaLAPACK (Dense Linear Solvers) , 2001, IEEE Trans. Computers.

[12]  Yves Robert,et al.  Matrix-matrix multiplication on heterogeneous platforms , 2000, Proceedings 2000 International Conference on Parallel Processing.