Matrix Multiplication on Multidimensional Torus Networks
暂无分享,去创建一个
[1] James Demmel,et al. Minimizing Communication in Linear Algebra , 2009, ArXiv.
[2] James Demmel,et al. Improving communication performance in dense linear algebra via topology aware collectives , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[3] Jarle Berntsen,et al. Communication efficient matrix multiplication on hypercubes , 1989, Parallel Comput..
[4] S. Lennart Johnsson,et al. Minimizing the Communication Time for Matrix Multiplication on Multiprocessors , 1993, Parallel Comput..
[5] Robert A. van de Geijn,et al. SUMMA: Scalable Universal Matrix Multiplication Algorithm , 1995 .
[6] Dror Irony,et al. Communication lower bounds for distributed-memory matrix multiplication , 2004, J. Parallel Distributed Comput..
[7] Alok Aggarwal,et al. Communication Complexity of PRAMs , 1990, Theor. Comput. Sci..
[8] Philip Heidelberger,et al. The IBM Blue Gene/Q interconnection network and message unit , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[9] Ibm Blue,et al. Overview of the IBM Blue Gene/P Project , 2008, IBM J. Res. Dev..
[10] Sartaj Sahni,et al. Parallel Matrix and Graph Algorithms , 1981, SIAM J. Comput..
[11] Lynn Elliot Cannon,et al. A cellular computer to implement the kalman filter algorithm , 1969 .
[12] Robert A. van de Geijn,et al. SUMMA: scalable universal matrix multiplication algorithm , 1995, Concurr. Pract. Exp..
[13] William Gropp,et al. Skjellum using mpi: portable parallel programming with the message-passing interface , 1994 .
[14] Laxmikant V. Kalé,et al. CHARM++: a portable concurrent object oriented system based on C++ , 1993, OOPSLA '93.
[15] Fumiyoshi Shoji,et al. The K computer: Japanese next-generation supercomputer development project , 2011, IEEE/ACM International Symposium on Low Power Electronics and Design.
[16] William J. Dally,et al. Performance Analysis of k-Ary n-Cube Interconnection Networks , 1987, IEEE Trans. Computers.
[17] Ramesh C. Agarwal,et al. A three-dimensional approach to parallel matrix multiplication , 1995, IBM J. Res. Dev..
[18] Robert A. van de Geijn,et al. A Pipelined Broadcast for Multidimensional Meshes , 1995, Parallel Process. Lett..
[19] Amith R. Mamidala,et al. MPI Collective Communications on The Blue Gene/P Supercomputer: Algorithms and Optimizations , 2009, 2009 17th IEEE Symposium on High Performance Interconnects.
[20] Emmanuel Jeannot,et al. Euro-Par 2011 Parallel Processing , 2011, Lecture Notes in Computer Science.
[21] James Demmel,et al. Communication-Optimal Parallel 2.5D Matrix Multiplication and LU Factorization Algorithms , 2011, Euro-Par.