Poster: Beating MKL and ScaLAPACK at Rectangular Matrix Multiplication Using the BFS/DFS Approach
暂无分享,去创建一个
James Demmel | Shoaib Kamil | Armando Fox | Oded Schwartz | Omer Spillinger | David Eliahu | Benjamin Lipshitz | J. Demmel | A. Fox | O. Schwartz | S. Kamil | Omer Spillinger | D. Eliahu | Benjamin Lipshitz
[1] James Demmel,et al. Communication-Optimal Parallel Recursive Rectangular Matrix Multiplication , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.
[2] James Demmel,et al. Communication-Optimal Parallel 2.5D Matrix Multiplication and LU Factorization Algorithms , 2011, Euro-Par.
[3] John Shalf,et al. SEJITS: Getting Productivity and Performance With Selective Embedded JIT Specialization , 2010 .
[4] James Demmel,et al. Brief announcement: strong scaling of matrix multiplication algorithms and memory-independent communication lower bounds , 2012, SPAA '12.
[5] Matteo Frigo,et al. Cache-oblivious algorithms , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).
[6] Dror Irony,et al. Communication lower bounds for distributed-memory matrix multiplication , 2004, J. Parallel Distributed Comput..
[7] James Demmel,et al. Communication-optimal parallel algorithm for strassen's matrix multiplication , 2012, SPAA '12.
[8] James Demmel,et al. Communication-Avoiding Parallel Strassen: Implementation and performance , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.