A parallel implementation of Strassen’s matrix multiplication algorithm for wormhole-routed all-port 2D torus networks

A new parallel implementation of Strassen’s matrix multiplication algorithm is proposed for massively parallel supercomputers with 2D, all-port torus interconnection networks. The proposed algorithm employs a special conflict-free routing pattern for better scalability and is able to yield a performance rate very close to the theoretical bound for many practical network and matrix sizes. It effectively scales up to very large networks typically containing hundreds-of-thousands processors where petaflop or exaflop processing rates are sought.

[1]  Robert A. van de Geijn,et al.  A High Performance Parallel Strassen Implementation , 1995, Parallel Process. Lett..

[2]  Lionel M. Ni,et al.  A survey of wormhole routing techniques in direct networks , 1993, Computer.

[3]  L. R. Kerr,et al.  On Minimizing the Number of Multiplications Necessary for Matrix Multiplication , 1969 .

[4]  Julian D. Laderman,et al.  A noncommutative algorithm for multiplying $3 \times 3$ matrices using 23 multiplications , 1976 .

[5]  Jack Dongarra,et al.  Experiments with Strassen's Algorithm: From Sequential to Parallel , 2006 .

[6]  Yuefan Deng,et al.  Parallelizing Strassen's method for matrix multiplication on distributed-memory MIMD architectures☆ , 1995 .

[7]  Shmuel Winograd,et al.  On multiplication of 2 × 2 matrices , 1971 .

[8]  V. Strassen Gaussian elimination is not optimal , 1969 .

[9]  Marco Bodrato,et al.  A Strassen-like matrix multiplication suited for squaring and higher power computation , 2010, ISSAC.

[10]  Qingshan Luo,et al.  A scalable parallel Strassen's matrix multiplication algorithm for distributed-memory computers , 1995, SAC '95.

[11]  Frédéric Suter,et al.  Mixed Parallel Implementations of Strassen and Winograd Matrix Multiplication Algorithms , 2001 .

[12]  John F. Kolen,et al.  Evolutionary Search for Matrix Multiplication Algorithms , 2001, FLAIRS.

[13]  Byung Ro Moon,et al.  Automatic Reproduction of a Genius Algorithm: Strassen's Algorithm Revisited by Genetic Search , 2010, IEEE Transactions on Evolutionary Computation.

[14]  David Notkin,et al.  Computer science in Japanese universities , 1993, Computer.

[15]  Thomas Rauber,et al.  Combining building blocks for parallel multi-level matrix multiplication , 2008, Parallel Comput..

[16]  Mitsuhisa Sato,et al.  Parallel implementation of Strassen's matrix multiplication algorithm for heterogeneous clusters , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[17]  Frédéric Suter,et al.  Impact of mixed‐parallelism on parallel implementations of the Strassen and Winograd matrix multiplication algorithms , 2004, Concurr. Pract. Exp..

[18]  N. Gastinel,et al.  Sur le calcul des produits de matrices , 1971 .

[19]  Jean-Guillaume Dumas,et al.  Memory efficient scheduling of Strassen-Winograd's matrix multiplication algorithm , 2007, ISSAC '09.

[20]  Harun Artuner,et al.  Efficient and Scalable Routing Algorithms for Collective Communication Operations on 2D All-Port Torus Networks , 2011, International Journal of Parallel Programming.

[21]  Bogdan Dumitrescu,et al.  Fast Matrix Multiplication Algorithms on Mimd Architectures , 1994, Parallel Algorithms Appl..

[22]  Bella Bose,et al.  On resource placements in 3D tori , 2003, J. Parallel Distributed Comput..

[23]  D. Bini Fast Matrix Multiplication in Handbook of Linear Algebra , 2007 .