Mixed Parallel Implementations of Strassen and Winograd Matrix Multiplication Algorithms
暂无分享,去创建一个
[1] Bernard Tourancheau,et al. Fast Runtime Block Cyclic Data Redistribution on Multiprocessors , 1997, J. Parallel Distributed Comput..
[2] Patrick C. Fischer,et al. Efficient Procedures for Using Matrix Algorithms , 1974, ICALP.
[3] Eddy Caron,et al. Performance Prediction and Analysis of Parallel Out-Of-Core Matrix Factorization , 2000, HiPC.
[4] Robert A. van de Geijn,et al. A High Performance Parallel Strassen Implementation , 1995, Parallel Process. Lett..
[5] Thomas Rauber,et al. Scheduling of Data Parallel Modules for Scientific Computing , 1999, PPSC.
[6] Bogdan Dumitrescu,et al. Fast Matrix Multiplication Algorithms on Mimd Architectures , 1994, Parallel Algorithms Appl..
[7] V. Strassen. Gaussian elimination is not optimal , 1969 .
[8] Mithuna Thottethodi,et al. Tuning Strassen's Matrix Multiplication for Memory Efficiency , 1998, Proceedings of the IEEE/ACM SC98 Conference.
[9] Nicholas J. Higham,et al. Exploiting fast matrix multiplication within the level 3 BLAS , 1990, TOMS.
[10] Prithviraj Banerjee,et al. Simultaneous exploitation of task and data parallelism in regular scientific applications , 1996 .
[11] Bo Kågström,et al. GEMM-based level 3 BLAS: high-performance model implementations and performance evaluation benchmark , 1998, TOMS.
[12] Mithuna Thottethodi,et al. Recursive array layouts and fast parallel matrix multiplication , 1999, SPAA '99.
[13] Matthew Haines,et al. Approaches for integrating task and data parallelism , 1998, IEEE Concurr..