Improving communication performance in dense linear algebra via topology aware collectives
暂无分享,去创建一个
[1] S. Dosanjh,et al. Architectures and Technology for Extreme Scale Computing Report from the Workshop Node Architecture and Power Reduction Strategies , 2011 .
[2] S. Lennart Johnsson,et al. Optimum Broadcasting and Personalized Communication in Hypercubes , 1989, IEEE Trans. Computers.
[3] Philip K. McKinley,et al. Collective Communication in Wormhole-Routed Massively Parallel Computers , 1995, Computer.
[4] Robert A. van de Geijn,et al. SUMMA: scalable universal matrix multiplication algorithm , 1995, Concurr. Pract. Exp..
[5] James Demmel,et al. CALU: A Communication Optimal LU Factorization Algorithm , 2011, SIAM J. Matrix Anal. Appl..
[6] Rajeev Thakur,et al. Optimization of Collective Communication Operations in MPICH , 2005, Int. J. High Perform. Comput. Appl..
[7] Robert A. van de Geijn,et al. Broadcasting on Meshes with Wormhole Routing , 1996, J. Parallel Distributed Comput..
[8] Lynn Elliot Cannon,et al. A cellular computer to implement the kalman filter algorithm , 1969 .
[9] William J. Dally,et al. Technology-Driven, Highly-Scalable Dragonfly Topology , 2008, 2008 International Symposium on Computer Architecture.
[10] Josep Torrellas. Architectures for Extreme-Scale Computing , 2009, Computer.
[11] Amith R. Mamidala,et al. MPI Collective Communications on The Blue Gene/P Supercomputer: Algorithms and Optimizations , 2009, 2009 17th IEEE Symposium on High Performance Interconnects.
[12] James Demmel,et al. Communication-Optimal Parallel 2.5D Matrix Multiplication and LU Factorization Algorithms , 2011, Euro-Par.
[13] W HockneyRoger. The communication challenge for MPP , 1994 .
[14] Dror Irony,et al. Trading Replication for Communication in Parallel Distributed-Memory Dense Solvers , 2002, Parallel Process. Lett..
[15] Roger W. Hockney,et al. The Communication Challenge for MPP: Intel Paragon and Meiko CS-2 , 1994, Parallel Computing.
[16] Jack J. Dongarra,et al. Performance analysis of MPI collective operations , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.
[17] Jack Dongarra,et al. ScaLAPACK user's guide , 1997 .
[18] James Demmel,et al. Communication avoiding Gaussian elimination , 2008, HiPC 2008.
[19] P. Sadayappan,et al. Communication efficient matrix multiplication on hypercubes , 1994, SPAA '94.
[20] D. G. Payne,et al. Broadcasting on Meshes with Worm-hole Routing , 1996 .
[21] Ramesh Subramonian,et al. LogP: towards a realistic model of parallel computation , 1993, PPOPP '93.
[22] Chris J. Scheiman,et al. LogGP: incorporating long messages into the LogP model—one step closer towards a realistic model for parallel computation , 1995, SPAA '95.
[23] Ramesh C. Agarwal,et al. A three-dimensional approach to parallel matrix multiplication , 1995, IBM J. Res. Dev..
[24] P. Sadayappan,et al. Communication-Efficient Matrix Multiplication on Hypercubes , 1996, Parallel Comput..
[25] Robert A. van de Geijn,et al. SUMMA: Scalable Universal Matrix Multiplication Algorithm , 1995 .
[26] Alok Aggarwal,et al. Communication Complexity of PRAMs , 1990, Theor. Comput. Sci..
[27] Philip Heidelberger,et al. The deep computing messaging framework: generalized scalable message passing on the blue gene/P supercomputer , 2008, ICS '08.
[28] James Demmel,et al. Minimizing Communication in Linear Algebra , 2009, ArXiv.