Communication-Avoiding Parallel Recursive Algorithms for Matrix Multiplication
暂无分享,去创建一个
[1] James Demmel,et al. Minimizing Communication in Numerical Linear Algebra , 2009, SIAM J. Matrix Anal. Appl..
[2] Jehoshua Bruck,et al. Efficient algorithms for all-to-all communications in multi-port message-passing systems , 1994, SPAA '94.
[3] Keshav Pingali,et al. Automatic Generation of Block-Recursive Codes , 2000, Euro-Par.
[4] John R. Gilbert,et al. The Combinatorial BLAS: design, implementation, and applications , 2011, Int. J. High Perform. Comput. Appl..
[5] Robert L. Probert. On the Additive Complexity of Matrix Multiplication , 1976, SIAM J. Comput..
[6] Alexander Tiskin. Communication-efficient parallel generic pairwise elimination , 2007, Future Gener. Comput. Syst..
[7] Michael Clausen,et al. Algebraic complexity theory , 1997, Grundlehren der mathematischen Wissenschaften.
[8] Robert A. van de Geijn,et al. SUMMA: Scalable Universal Matrix Multiplication Algorithm , 1995 .
[9] Dror Irony,et al. Communication lower bounds for distributed-memory matrix multiplication , 2004, J. Parallel Distributed Comput..
[10] Victor Y. Pan,et al. Fast rectangular matrix multiplications and improving parallel matrix computations , 1997, PASCO '97.
[11] Vijaya Ramachandran,et al. Oblivious algorithms for multicores and network of processors , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).
[12] Nader H. Bshouty,et al. On the Additive Complexity of 2 x 2 Matrix Multiplication , 1995, Inf. Process. Lett..
[13] Shmuel Winograd,et al. On multiplication of 2 × 2 matrices , 1971 .
[14] James Demmel,et al. Communication-optimal parallel algorithm for strassen's matrix multiplication , 2012, SPAA '12.
[15] Jaeyoung Choi,et al. Pumma: Parallel universal matrix multiplication algorithms on distributed memory concurrent computers , 1994, Concurr. Pract. Exp..
[16] James Demmel,et al. Communication-Optimal Parallel 2.5D Matrix Multiplication and LU Factorization Algorithms , 2011, Euro-Par.
[17] Victor Y. Pan,et al. Fast Rectangular Matrix Multiplication and Applications , 1998, J. Complex..
[18] James Demmel,et al. Communication optimal parallel multiplication of sparse random matrices , 2013, SPAA.
[19] James Demmel,et al. Communication-Avoiding Parallel Strassen: Implementation and performance , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.
[20] James Demmel,et al. Graph expansion and communication costs of fast matrix multiplication: regular submission , 2011, SPAA '11.
[21] V. Strassen. Gaussian elimination is not optimal , 1969 .
[22] Robert A. van de Geijn,et al. A flexible class of parallel matrix multiplication algorithms , 1998, Proceedings of the First Merged International Parallel Processing Symposium and Symposium on Parallel and Distributed Processing.
[23] Grazia Lotti,et al. O(n2.7799) Complexity for n*n Approximate Matrix Multiplication , 1979, Inf. Process. Lett..
[24] Jaeyoung Choi,et al. A new parallel matrix multiplication algorithm on distributed‐memory concurrent computers , 1998 .
[25] Qingshan Luo,et al. A scalable parallel Strassen's matrix multiplication algorithm for distributed-memory computers , 1995, SAC '95.
[26] Lynn Elliot Cannon,et al. A cellular computer to implement the kalman filter algorithm , 1969 .
[27] Grazia Lotti,et al. On the Asymptotic Complexity of Rectangular Matrix Multiplication , 1983, Theor. Comput. Sci..
[28] M. Challacombe. A general parallel sparse-blocked matrix multiply for linear scaling SCF theory , 2000 .
[29] Robert A. van de Geijn,et al. A High Performance Parallel Strassen Implementation , 1995, Parallel Process. Lett..
[30] Marc Snir,et al. GETTING UP TO SPEED THE FUTURE OF SUPERCOMPUTING , 2004 .
[31] James Demmel,et al. Communication-optimal Parallel and Sequential Cholesky Decomposition , 2009, SIAM J. Sci. Comput..
[32] H. Whitney,et al. An inequality related to the isoperimetric inequality , 1949 .
[33] James Demmel,et al. Fast matrix multiplication is stable , 2006, Numerische Mathematik.
[34] D. Coppersmiths. RAPID MULTIPLICATION OF RECTANGULAR MATRICES * , 2014 .
[35] Alexander Tiskin,et al. All-Pairs Shortest Paths Computation in the BSP Model , 2001, ICALP.
[36] James Demmel,et al. Perfect Strong Scaling Using No Additional Energy , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.
[37] Jaeyoung Choi. A new parallel matrix multiplication algorithm on distributed-memory concurrent computers , 1998, Concurr. Pract. Exp..
[38] Katherine A. Yelick,et al. Communication avoiding and overlapping for numerical linear algebra , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.
[39] Don Coppersmith,et al. Rectangular Matrix Multiplication Revisited , 1997, J. Complex..
[40] Alexander Tiskin,et al. Memory-Efficient Matrix Multiplication in the BSP Model , 1999, Algorithmica.
[41] Nicholas J. Higham,et al. INVERSE PROBLEMS NEWSLETTER , 1991 .
[42] Katherine A. Yelick,et al. A Communication-Optimal N-Body Algorithm for Direct Interactions , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.
[43] James Demmel,et al. Brief announcement: strong scaling of matrix multiplication algorithms and memory-independent communication lower bounds , 2012, SPAA '12.
[44] James Demmel,et al. Fast linear algebra is stable , 2006, Numerische Mathematik.
[45] John Shalf,et al. Exascale Computing Technology Challenges , 2010, VECPAR.
[46] Kyriakos Kalorkoti. ALGEBRAIC COMPLEXITY THEORY (Grundlehren der Mathematischen Wissenschaften 315) , 1999 .
[47] James Demmel,et al. Communication efficient gaussian elimination with partial pivoting using a shape morphing data layout , 2013, SPAA.
[48] John R. Gilbert,et al. Parallel Sparse Matrix-Matrix Multiplication and Indexing: Implementation and Experiments , 2011, SIAM J. Sci. Comput..
[49] Raphael Yuster,et al. Fast sparse matrix multiplication , 2004, TALG.
[50] John R. Gilbert,et al. Challenges and Advances in Parallel Sparse Matrix-Matrix Multiplication , 2008, 2008 37th International Conference on Parallel Processing.
[51] Barton P. Miller,et al. Critical path analysis for the execution of parallel and distributed programs , 1988, [1988] Proceedings. The 8th International Conference on Distributed.
[52] L. R. Kerr,et al. On Minimizing the Number of Multiplications Necessary for Matrix Multiplication , 1969 .
[53] Hans Werner Meuer,et al. Top500 Supercomputer Sites , 1997 .
[54] Eli Upfal,et al. Efficient Algorithms for All-to-All Communications in Multiport Message-Passing Systems , 1997, IEEE Trans. Parallel Distributed Syst..
[55] Larry Rudolph,et al. Techniques for Parallel Manipulation of Sparse Matrices , 1989, Theor. Comput. Sci..
[56] James Demmel,et al. Graph Expansion Analysis for Communication Costs of Fast Rectangular Matrix Multiplication , 2012, MedAlg.
[57] Ramesh C. Agarwal,et al. A three-dimensional approach to parallel matrix multiplication , 1995, IBM J. Res. Dev..
[58] P. Sadayappan,et al. Communication-Efficient Matrix Multiplication on Hypercubes , 1996, Parallel Comput..
[59] ToledoSivan,et al. Communication lower bounds for distributed-memory matrix multiplication , 2004 .
[60] James Demmel,et al. Communication-Optimal Parallel Recursive Rectangular Matrix Multiplication , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.
[61] Julian D. Laderman,et al. On practical algorithms for accelerated matrix multiplication , 1992 .
[62] Volker Strassen,et al. Algebraic Complexity Theory , 1991, Handbook of Theoretical Computer Science, Volume A: Algorithms and Complexity.
[63] Geppino Pucci,et al. Network-Oblivious Algorithms , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.
[64] Charles E. Leiserson,et al. Cache-Oblivious Algorithms , 2003, CIAC.
[65] Viktor K. Prasanna,et al. Optimizing graph algorithms for improved cache performance , 2002, IEEE Transactions on Parallel and Distributed Systems.
[66] Nedjeljko Frančula. The National Academies Press , 2013 .
[67] David S. Wise,et al. Seven at one stroke: results from a cache-oblivious paradigm for scalable matrix algorithms , 2006, MSPC '06.
[68] James Demmel,et al. Minimizing Communication in All-Pairs Shortest Paths , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.
[69] Robert A. van de Geijn,et al. SUMMA: scalable universal matrix multiplication algorithm , 1995, Concurr. Pract. Exp..
[70] Samuel H. Fuller,et al. The Future of Computing Performance: Game Over or Next Level? , 2014 .