Optimizing non-blocking collective operations for infiniband
暂无分享,去创建一个
[1] Laxmikant V. Kalé,et al. A framework for collective personalized communication , 2003, Proceedings International Parallel and Distributed Processing Symposium.
[2] Torsten Hoefler,et al. Analysis of the Memory Registration Process in the Mellanox InfiniBand Software Stack , 2006, Euro-Par.
[3] Torsten Hoefler,et al. A practically constant-time MPI Broadcast Algorithm for large-scale InfiniBand Clusters with Multicast , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.
[4] Charles L. Lawson,et al. Basic Linear Algebra Subprograms for Fortran Usage , 1979, TOMS.
[5] D. Martin Swany,et al. Transformations to Parallel Codes for Communication-Computation Overlap , 2005, ACM/IEEE SC 2005 Conference (SC'05).
[6] Torsten Hoefler,et al. Fast barrier synchronization for InfiniBand/spl trade/ , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.
[7] Sergei Gorlatch,et al. Send-receive considered harmful: Myths and realities of message passing , 2004, TOPL.
[8] Torsten Hoefler,et al. Netgauge: A Network Performance Measurement Framework , 2007, HPCC.
[9] Torsten Hoefler,et al. Fast barrier synchronization for InfiniBand , 2006 .
[10] Costin Iancu,et al. HUNTing the overlap , 2005, 14th International Conference on Parallel Architectures and Compilation Techniques (PACT'05).
[11] Torsten Hoefler,et al. Assessing Single-Message and Multi-Node Communication Performance of InfiniBand , 2006, International Symposium on Parallel Computing in Electrical Engineering (PARELEC'06).
[12] Sayantan Sur,et al. Zero-copy protocol for MPI using infiniband unreliable datagram , 2007, 2007 IEEE International Conference on Cluster Computing.
[13] Torsten Hoefler,et al. Scalable High Performance Message Passing over InfiniBand for Open MPI , 2007 .
[14] Scott Pakin,et al. The Case of the Missing Supercomputer Performance: Achieving Optimal Performance on the 8, 192 Processors of ASCI Q , 2003, SC.
[15] Torsten Hoefler,et al. A Case for Non-blocking Collective Operations , 2006, ISPA Workshops.
[16] Rossen Dimitrov,et al. Overlapping of Communication and Computation and Early Binding: Fundamental Mechanisms for Improving , 2001 .
[17] Dhabaleswar K. Panda,et al. High performance RDMA-based MPI implementation over InfiniBand , 2003, ICS.
[18] Torsten Hoefler,et al. Design, Implementation, and Usage of LibNBC , 2006 .
[19] Torsten Hoefler,et al. Accurately measuring collective operations at massive scale , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.
[20] Jeffrey M. Squyres,et al. The Component Architecture of Open MPI: Enabling Third-Party Collective Algorithms* , 2005 .
[21] J. White,et al. An Analysis of Popular Mpi Implementations , .
[22] Christopher Wilson,et al. COMB: a portable benchmark suite for assessing MPI overlap , 2002, Proceedings. IEEE International Conference on Cluster Computing.
[23] Torsten Hoefler,et al. Low-Overhead LogGP Parameter Assessment for Modern Interconnection Networks , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.
[24] Sayantan Sur,et al. High performance MPI design using unreliable datagram for ultra-scale InfiniBand clusters , 2007, ICS '07.
[25] George Bosilca,et al. High Performance RDMA Protocols in HPC , 2006, PVM/MPI.
[26] Torsten Hoefler,et al. Implementation and performance analysis of non-blocking collective operations for MPI , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).
[27] Sayantan Sur,et al. RDMA read based rendezvous protocol for MPI over InfiniBand: design alternatives and benefits , 2006, PPoPP '06.
[28] Werner Augustin,et al. On Benchmarking Collective MPI Operations , 2002, PVM/MPI.
[29] Message Passing Interface Forum. MPI: A message - passing interface standard , 1994 .
[30] George Bosilca,et al. Open MPI: Goals, Concept, and Design of a Next Generation MPI Implementation , 2004, PVM/MPI.
[31] Alex Rapaport,et al. Mpi-2: extensions to the message-passing interface , 1997 .
[32] Torsten Hoefler,et al. Non-Blocking Collective Operations for MPI-2 , 2006 .
[33] I. Coorporation,et al. Using the rdtsc instruction for performance monitoring , 1997 .
[34] William Gropp,et al. Reproducible Measurements of MPI Performance Characteristics , 1999, PVM/MPI.