Mixed Parallel Implementations of Strassen and Winograd Matrix Multiplication Algorithms

[1]  Bernard Tourancheau,et al.  Fast Runtime Block Cyclic Data Redistribution on Multiprocessors , 1997, J. Parallel Distributed Comput..

[2]  Patrick C. Fischer,et al.  Efficient Procedures for Using Matrix Algorithms , 1974, ICALP.

[3]  Eddy Caron,et al.  Performance Prediction and Analysis of Parallel Out-Of-Core Matrix Factorization , 2000, HiPC.

[4]  Robert A. van de Geijn,et al.  A High Performance Parallel Strassen Implementation , 1995, Parallel Process. Lett..

[5]  Thomas Rauber,et al.  Scheduling of Data Parallel Modules for Scientific Computing , 1999, PPSC.

[6]  Bogdan Dumitrescu,et al.  Fast Matrix Multiplication Algorithms on Mimd Architectures , 1994, Parallel Algorithms Appl..

[7]  V. Strassen Gaussian elimination is not optimal , 1969 .

[8]  Mithuna Thottethodi,et al.  Tuning Strassen's Matrix Multiplication for Memory Efficiency , 1998, Proceedings of the IEEE/ACM SC98 Conference.

[9]  Nicholas J. Higham,et al.  Exploiting fast matrix multiplication within the level 3 BLAS , 1990, TOMS.

[10]  Prithviraj Banerjee,et al.  Simultaneous exploitation of task and data parallelism in regular scientific applications , 1996 .

[11]  Bo Kågström,et al.  GEMM-based level 3 BLAS: high-performance model implementations and performance evaluation benchmark , 1998, TOMS.

[12]  Mithuna Thottethodi,et al.  Recursive array layouts and fast parallel matrix multiplication , 1999, SPAA '99.

[13]  Matthew Haines,et al.  Approaches for integrating task and data parallelism , 1998, IEEE Concurr..