A polyalgorithmic approach applied for fast matrix multiplication on clusters

Summary form only given. There is today an increasing diversity of parallel execution supports. Solving a target problem by using a single algorithm is not always efficient on any computational support. We present here a polyalgorithmic approach for selecting the most suitable algorithm among various ones for given problem size and available resources. Our principal objective here is to illustrate such an approach on the well-known matrix multiplication problem which is one of the most important basic numerical kernels. More precisely, we propose a polyalgorithm which uses both advantages of standard and fast algorithms which is able to automatically choose the right and suitable algorithm for computing the matrix multiplication of any dimension on a particular parallel system. We target this approach on homogeneous clusters of PCs while providing some experiments.

[1]  Rajkumar Buyya,et al.  High Performance Cluster Computing: Architectures and Systems , 1999 .

[2]  Anoop Gupta,et al.  Parallel computer architecture - a hardware / software approach , 1998 .

[3]  Andrea Lodi,et al.  Two-dimensional packing problems: A survey , 2002, Eur. J. Oper. Res..

[4]  Robert A. van de Geijn,et al.  A High Performance Parallel Strassen Implementation , 1995, Parallel Process. Lett..

[5]  Krzysztof Czarnecki,et al.  Generative Programming , 2001, ECOOP Workshops.

[6]  Anthony Skjellum,et al.  A poly-algorithm for parallel dense matrix multiplication on two-dimensional process grid topologies , 1997, Concurr. Pract. Exp..

[7]  Matthew Haines,et al.  Approaches for integrating task and data parallelism , 1998, IEEE Concurr..

[8]  Anthony Skjellum,et al.  A poly‐algorithm for parallel dense matrix multiplication on two‐dimensional process grid topologies , 1997 .

[9]  Jack J. Dongarra,et al.  A set of level 3 basic linear algebra subprograms , 1990, TOMS.

[10]  Bogdan Dumitrescu,et al.  Fast Matrix Multiplication Algorithms on Mimd Architectures , 1994, Parallel Algorithms Appl..

[11]  Frédéric Suter,et al.  Mixed parallel implementations of the top level step of Strassen and Winograd matrix multiplication algorithms , 2001, Proceedings 15th International Parallel and Distributed Processing Symposium. IPDPS 2001.

[12]  Ami Marowka,et al.  The GRID: Blueprint for a New Computing Infrastructure , 2000, Parallel Distributed Comput. Pract..

[13]  Nicholas J. Higham,et al.  Exploiting fast matrix multiplication within the level 3 BLAS , 1990, TOMS.

[14]  Gene H. Golub,et al.  Matrix Computations, Third Edition , 1996 .

[15]  Ramesh C. Agarwal,et al.  A high-performance matrix-multiplication algorithm on a distributed-memory parallel computer, using overlapped communication , 1994, IBM J. Res. Dev..

[16]  V. Strassen Gaussian elimination is not optimal , 1969 .

[17]  Thomas Rauber,et al.  A Transformation Approach to Derive Efficient Parallel Implementations , 2000, IEEE Trans. Software Eng..