Optimizing Matrix Transpose on Torus Interconnects
暂无分享,去创建一个
[1] S. Lennart Johnsson,et al. Algorithms for Matrix Transposition on Boolean n-Cube Configured Ensemble Architectures , 1988, ICPP.
[2] Ibm Blue,et al. Overview of the IBM Blue Gene/P Project , 2008, IBM J. Res. Dev..
[3] Jack Dongarra,et al. Introduction to the HPCChallenge Benchmark Suite , 2004 .
[4] James Demmel,et al. ScaLAPACK: A Portable Linear Algebra Library for Distributed Memory Computers - Design Issues and Performance , 1995, PARA.
[5] Soo-Young Lee,et al. Synchronous And Asynchronous Algorithms For Matrix Transposition On MCAP , 1988, Optics & Photonics.
[6] Jack Dongarra,et al. Parallel matrix transpose algorithms on distributed memory concurrent computers , 1993, Proceedings of Scalable Parallel Libraries Conference.
[7] Rajeev Thakur,et al. Optimization of Collective Communication Operations in MPICH , 2005, Int. J. High Perform. Comput. Appl..
[8] Dianne P. O'Leary,et al. Systolic Arrays for Matrix Transpose and Other Reorderings , 1987, IEEE Transactions on Computers.
[9] Daisuke Takahashi,et al. The HPC Challenge (HPCC) benchmark suite , 2006, SC.
[10] Leslie G. Valiant,et al. A Scheme for Fast Parallel Communication , 1982, SIAM J. Comput..
[11] J. O. Eklundh,et al. A Fast Computer Method for Matrix Transposing , 1972, IEEE Transactions on Computers.
[12] Harald Räcke. Survey on Oblivious Routing Strategies , 2009, CiE.