STAR-MPI: self tuned adaptive routines for MPI collective operations
暂无分享,去创建一个
[1] Satoshi Matsuoka,et al. OMPI: Optimizing MPI Programs using Partial Evaluation , 1996, Proceedings of the 1996 ACM/IEEE Conference on Supercomputing.
[2] Mario Lauria,et al. MPI-FM: High Performance MPI on Workstation Clusters , 1997, J. Parallel Distributed Comput..
[3] Eli Upfal,et al. Efficient Algorithms for All-to-All Communications in Multiport Message-Passing Systems , 1997, IEEE Trans. Parallel Distributed Syst..
[4] James Demmel,et al. Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology , 1997, ICS '97.
[5] Jack J. Dongarra,et al. Automatically Tuned Linear Algebra Software , 1998, Proceedings of the IEEE/ACM SC98 Conference.
[6] Steven G. Johnson,et al. FFTW: an adaptive software architecture for the FFT , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).
[7] I. Rosenblum,et al. MULTI-PROCESSOR MOLECULAR DYNAMICS USING THE BRENNER POTENTIAL: PARALLELIZATION OF AN IMPLICIT MULTI-BODY POTENTIAL , 1999 .
[8] William Gropp,et al. Reproducible Measurements of MPI Performance Characteristics , 1999, PVM/MPI.
[9] S. Sistare,et al. Optimization of MPI Collectives on Clusters of Large-Scale SMPs , 1999, ACM/IEEE SC 1999 Conference (SC'99).
[10] Henri E. Bal,et al. MagPIe: MPI's collective communication operations for clustered wide area systems , 1999, PPoPP '99.
[11] Steve Sistare,et al. Optimization of MPI Collectives on Clusters of Large-Scale SMP's , 1999, SC.
[12] Tao Yang,et al. Program transformation and runtime support for threaded MPI execution on shared-memory machines , 2000, TOPL.
[13] Sathish S. Vadhiyar,et al. Automatically Tuned Collective Communications , 2000, ACM/IEEE SC 2000 Conference (SC'00).
[14] Xin Yuan,et al. CC--MPI: a compiled communication capable MPI prototype for ethernet switched clusters , 2003, PPoPP '03.
[15] R. Rabenseifner,et al. Automatic MPI Counter Profiling of All Users: First Results on a CRAY T3E 900-512 , 2004 .
[16] Xin Yuan,et al. Automatic generation and tuning of MPI collective communication routines , 2005, ICS '05.
[17] Xin Yuan,et al. Message scheduling for all-to-all personalized communication on ethernet switched clusters , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.
[18] Rajeev Thakur,et al. Optimization of Collective Communication Operations in MPICH , 2005, Int. J. High Perform. Comput. Appl..
[19] Xin Yuan,et al. Bandwidth Efficient All-to-All Broadcast on Switched Clusters , 2005, 2005 IEEE International Conference on Cluster Computing.
[20] Xin Yuan,et al. Pipelined broadcast on Ethernet switched clusters , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.