Communication-Optimal Parallel Recursive Rectangular Matrix Multiplication
暂无分享,去创建一个
James Demmel | Shoaib Kamil | Armando Fox | Oded Schwartz | Omer Spillinger | David Eliahu | Benjamin Lipshitz | J. Demmel | A. Fox | O. Schwartz | S. Kamil | Omer Spillinger | D. Eliahu | Benjamin Lipshitz | Shoaib Kamil
[1] Ramesh C. Agarwal,et al. A three-dimensional approach to parallel matrix multiplication , 1995, IBM J. Res. Dev..
[2] Robert A. van de Geijn,et al. SUMMA: Scalable Universal Matrix Multiplication Algorithm , 1995 .
[3] Dror Irony,et al. Communication lower bounds for distributed-memory matrix multiplication , 2004, J. Parallel Distributed Comput..
[4] H. Whitney,et al. An inequality related to the isoperimetric inequality , 1949 .
[5] Gang Ren,et al. Is Search Really Necessary to Generate High-Performance BLAS? , 2005, Proceedings of the IEEE.
[6] James Demmel,et al. Minimizing Communication in Numerical Linear Algebra , 2009, SIAM J. Matrix Anal. Appl..
[7] Vijaya Ramachandran,et al. Oblivious algorithms for multicores and network of processors , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).
[8] James Demmel,et al. Communication-optimal parallel algorithm for strassen's matrix multiplication , 2012, SPAA '12.
[9] James Demmel,et al. Communication-Avoiding Parallel Strassen: Implementation and performance , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.
[10] Don Coppersmith,et al. Rectangular Matrix Multiplication Revisited , 1997, J. Complex..
[11] Volker Strassen,et al. Algebraic Complexity Theory , 1991, Handbook of Theoretical Computer Science, Volume A: Algorithms and Complexity.
[12] Alexander Tiskin,et al. Memory-Efficient Matrix Multiplication in the BSP Model , 1999, Algorithmica.
[13] John Shalf,et al. SEJITS: Getting Productivity and Performance With Selective Embedded JIT Specialization , 2010 .
[14] Michael Clausen,et al. Algebraic complexity theory , 1997, Grundlehren der mathematischen Wissenschaften.
[15] Victor Y. Pan,et al. Fast Rectangular Matrix Multiplication and Applications , 1998, J. Complex..
[16] Geppino Pucci,et al. Network-Oblivious Algorithms , 2007, IPDPS.
[17] H. T. Kung,et al. I/O complexity: The red-blue pebble game , 1981, STOC '81.
[18] Grazia Lotti,et al. O(n2.7799) Complexity for n*n Approximate Matrix Multiplication , 1979, Inf. Process. Lett..
[19] Erik Elmroth,et al. SIAM REVIEW c ○ 2004 Society for Industrial and Applied Mathematics Vol. 46, No. 1, pp. 3–45 Recursive Blocked Algorithms and Hybrid Data Structures for Dense Matrix Library Software ∗ , 2022 .
[20] James Demmel,et al. Improving communication performance in dense linear algebra via topology aware collectives , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[21] Jack J. Dongarra,et al. A Portable Programming Interface for Performance Evaluation on Modern Processors , 2000, Int. J. High Perform. Comput. Appl..
[22] Keshav Pingali,et al. An experimental comparison of cache-oblivious and cache-conscious programs , 2007, SPAA '07.
[23] D. Coppersmiths. RAPID MULTIPLICATION OF RECTANGULAR MATRICES * , 2014 .
[24] Robert A. van de Geijn,et al. SUMMA: scalable universal matrix multiplication algorithm , 1995, Concurr. Pract. Exp..
[25] L. R. Kerr,et al. On Minimizing the Number of Multiplications Necessary for Matrix Multiplication , 1969 .
[26] Grazia Lotti,et al. On the Asymptotic Complexity of Rectangular Matrix Multiplication , 1983, Theor. Comput. Sci..
[27] Matteo Frigo,et al. Cache-oblivious algorithms , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).
[28] P. Sadayappan,et al. Communication-Efficient Matrix Multiplication on Hypercubes , 1996, Parallel Comput..
[29] James Demmel,et al. Graph Expansion Analysis for Communication Costs of Fast Rectangular Matrix Multiplication , 2012, MedAlg.
[30] V. Strassen. Gaussian elimination is not optimal , 1969 .
[31] Victor Y. Pan,et al. Fast rectangular matrix multiplications and improving parallel matrix computations , 1997, PASCO '97.
[32] James Demmel,et al. Communication-Optimal Parallel 2.5D Matrix Multiplication and LU Factorization Algorithms , 2011, Euro-Par.
[33] James Demmel,et al. Matrix Multiplication on Multidimensional Torus Networks , 2012, VECPAR.
[34] James Demmel,et al. Brief announcement: strong scaling of matrix multiplication algorithms and memory-independent communication lower bounds , 2012, SPAA '12.