A Study of Process Arrival Patterns for MPI Collective Operations
暂无分享,去创建一个
[1] I. Rosenblum,et al. MULTI-PROCESSOR MOLECULAR DYNAMICS USING THE BRENNER POTENTIAL: PARALLELIZATION OF AN IMPLICIT MULTI-BODY POTENTIAL , 1999 .
[2] Rajeev Thakur,et al. Optimization of Collective Communication Operations in MPICH , 2005, Int. J. High Perform. Comput. Appl..
[3] Xin Yuan,et al. A Message Scheduling Scheme for All-to-All Personalized Communication on Ethernet Switched Clusters , 2007, IEEE Transactions on Parallel and Distributed Systems.
[4] Xin Yuan,et al. Pipelined broadcast on Ethernet switched clusters , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.
[5] Quentin F. Stout,et al. Statistical Analysis of Communication Time on the IBM SP2 , 2008 .
[6] Quentin F. Stout,et al. The Use of the MPI Communication Library in the NAS Parallel Benchmarks , 1999 .
[7] G. Matthews,et al. Molecular dynamics simulator , 1993 .
[8] F. Petrini,et al. The Case of the Missing Supercomputer Performance: Achieving Optimal Performance on the 8,192 Processors of ASCI Q , 2003, ACM/IEEE SC 2003 Conference (SC'03).
[9] Basel A. Mahafzah,et al. Statistical analysis of message passing programs to guide computer design , 1998, Proceedings of the Thirty-First Hawaii International Conference on System Sciences.
[10] Peter Sanders,et al. A bandwidth latency tradeoff for broadcast and reduction , 2003, Inf. Process. Lett..
[11] Eli Upfal,et al. Efficient Algorithms for All-to-All Communications in Multiport Message-Passing Systems , 1997, IEEE Trans. Parallel Distributed Syst..
[12] Sathish S. Vadhiyar,et al. Automatically Tuned Collective Communications , 2000, ACM/IEEE SC 2000 Conference (SC'00).
[13] D. Panda,et al. Efficient Barrier and Allreduce on InfiniBand Clusters using Hardware Multicast and Adaptive Algorithms , 2004 .
[14] Ahmad Faraj,et al. Communication Characteristics in the NAS Parallel Benchmarks , 2002, IASTED PDCS.
[15] Xin Yuan,et al. An MPI prototype for compiled communication on Ethernet switched clusters , 2005, J. Parallel Distributed Comput..
[16] R. Rabenseifner,et al. Automatic MPI Counter Profiling of All Users: First Results on a CRAY T3E 900-512 , 2004 .
[17] Jack J. Dongarra,et al. Performance analysis of MPI collective operations , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.
[18] Xin Yuan,et al. Automatic generation and tuning of MPI collective communication routines , 2005, ICS '05.
[19] Xin Yuan,et al. STAR-MPI: self tuned adaptive routines for MPI collective operations , 2006, ICS '06.
[20] Jeffrey S. Vetter,et al. An Empirical Performance Evaluation of Scalable Scientific Applications , 2002, ACM/IEEE SC 2002 Conference (SC'02).
[21] Yves Robert,et al. Pipelining broadcasts on heterogeneous platforms , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..
[22] Amith R. Mamidala,et al. Efficient Barrier and Allreduce on Infiniband clusters using multicast and adaptive algorithms , 2004, 2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935).
[23] Anthony Skjellum,et al. A High-Performance, Portable Implementation of the MPI Message Passing Interface Standard , 1996, Parallel Comput..
[24] Rami G. Melhem,et al. Algorithms for Supporting Compiled Communication , 2003, IEEE Trans. Parallel Distributed Syst..
[25] Cécile Germain,et al. Static Communications in Parallel Scientific Propgrams , 1994, PARLE.
[26] GroppWilliam,et al. Optimization of Collective Communication Operations in MPICH , 2005 .
[27] Xin Yuan,et al. Bandwidth Efficient All-to-All Broadcast on Switched Clusters , 2005, 2005 IEEE International Conference on Cluster Computing.