Automatic MPI application transformation with ASPhALT
暂无分享,去创建一个
[1] Mohamed M. Zahran,et al. Productivity analysis of the UPC language , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..
[2] D. Martin Swany,et al. Transformations to Parallel Codes for Communication-Computation Overlap , 2005, ACM/IEEE SC 2005 Conference (SC'05).
[3] Alain Darte,et al. the NESTOR Library: A Tool for Implementing FORTRAN Source Transformations , 1999, HPCN Europe.
[4] Rudolf Eigenmann,et al. Towards automatic translation of OpenMP to MPI , 2005, ICS '05.
[5] Sayantan Sur,et al. RDMA read based rendezvous protocol for MPI over InfiniBand: design alternatives and benefits , 2006, PPoPP '06.
[6] Jack J. Dongarra,et al. Performance Study of LU Factorization with Low Communication Overhead on Multiprocessors , 1995, Parallel Process. Lett..
[7] Xin Yuan,et al. Automatic generation and tuning of MPI collective communication routines , 2005, ICS '05.
[8] Ken Kennedy,et al. Compiler optimizations for Fortran D on MIMD distributed-memory machines , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).
[9] Tarek S. Abdelrahman,et al. Computation-Communication Overlap on Network-of-Workstation Multiprocessors , 2001 .
[10] Jack J. Dongarra,et al. Automatically Tuned Linear Algebra Software , 1998, Proceedings of the IEEE/ACM SC98 Conference.
[11] Wei Chen,et al. Message Strip-Mining Heuristics for High Speed Networks , 2004, VECPAR.
[12] Robert W. Numrich,et al. Co-array Fortran for parallel programming , 1998, FORF.
[13] Walter F. Tichy,et al. Measuring High Performance Computing Productivity , 2004, Int. J. High Perform. Comput. Appl..
[14] Ken Kennedy,et al. Strategy for Compiling Parallel Matlab for General Distributions , 2006 .
[15] Mark J. Clement,et al. Overlapping Computations, Communications and I/O in parallel Sorting , 1995, J. Parallel Distributed Comput..
[16] Charles L. Seitz,et al. Myrinet: A Gigabit-per-Second Local Area Network , 1995, IEEE Micro.
[17] D. Martin Swany,et al. An automated approach to improve communication-computation overlap in clusters , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.
[18] Bryan Carpenter,et al. ARMCI: A Portable Remote Memory Copy Libray for Ditributed Array Libraries and Compiler Run-Time Systems , 1999, IPPS/SPDP Workshops.
[19] Katherine A. Yelick,et al. Optimizing bandwidth limited problems using one-sided communication and overlap , 2005, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.
[20] Katherine Yelick,et al. Titanium Language Reference Manual , 2001 .
[21] Matteo Frigo,et al. A fast Fourier transform compiler , 1999, SIGP.
[22] Walter F. Tichy,et al. Measuring HPC productivity , 2004 .