论文信息 - Near-Optimal Rendezvous Protocols for RDMA-Enabled Clusters

Near-Optimal Rendezvous Protocols for RDMA-Enabled Clusters

Optimizing Message Passing Interface (MPI) point-to-point communication for large messages is of paramount importance since most communications in MPI applications are performed by such operations. Remote Direct Memory Access (RDMA) allows one-sided data transfer and provides great flexibility in the design of efficient communication protocols for large messages. However, achieving high performance on RDMA-enabled clusters is still challenging due to the complexity both in communication protocols and in protocol invocation scenarios. In this work, we investigate a profile-driven compiled-assisted protocol customization approach for efficient communication on RDMA-enabled clusters. We analyze existing protocols and show that they are not ideal in many situations. By leveraging the RDMA capability, we develop a set of protocols that can provide near-optimal performance for all protocol invocation scenarios, which allows protocol customization to achieve near-optimal performance when the appropriate protocol is used for each communication. Finally, we evaluate the potential benefits of protocol customization using micro-benchmarks and application benchmarks. The results demonstrate that the proposed protocols can out-perform traditional rendezvous protocols to a large degree in many situations and that protocol customization can significantly improve MPI communication performance.

[1] D. Martin Swany,et al. Gravel: A Communication Library to Fast Path MPI , 2008, PVM/MPI.

[2] Sayantan Sur,et al. RDMA read based rendezvous protocol for MPI over InfiniBand: design alternatives and benefits , 2006, PPoPP '06.

[3] John R. Gilbert,et al. Challenges and Advances in Parallel Sparse Matrix-Matrix Multiplication , 2008, 2008 37th International Conference on Parallel Processing.

[4] Scott Pakin. Receiver-initiated message passing over RDMA Networks , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[5] David H. Bailey,et al. The Nas Parallel Benchmarks , 1991, Int. J. High Perform. Comput. Appl..

[6] Martin Burtscher,et al. Tolerating Message Latency Through the Early Release of Blocked Receives , 2005, Euro-Par.

[7] Dhabaleswar K. Panda,et al. High Performance RDMA-Based MPI Implementation over InfiniBand , 2003, ICS '03.

[8] Kenichi Hayashi,et al. An MPI library which uses polling, interrupts and remote copying for the Fujitsu AP1000+ , 1996, Proceedings Second International Symposium on Parallel Architectures, Algorithms, and Networks (I-SPAN'96).

[9] Chamath Keppitiyagama,et al. Asynchronous MPI messaging on Myrinet , 2001, Proceedings 15th International Parallel and Distributed Processing Symposium. IPDPS 2001.

[10] Dhabaleswar K. Panda,et al. Host-assisted zero-copy remote memory access communication on InfiniBand , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[11] Ahmad Afsahi,et al. Improving Communication Progress and Overlap in MPI Rendezvous Protocol over RDMA-enabled Interconnects , 2008, 2008 22nd International Symposium on High Performance Computing Systems and Applications.

[12] Amith R. Mamidala,et al. Lock-Free Asynchronous Rendezvous Design for MPI Point-to-Point Communication , 2008, PVM/MPI.

[13] Timothy A. Davis,et al. The university of Florida sparse matrix collection , 2011, TOMS.

[14] Xin Yuan,et al. Maximizing MPI point-to-point communication performance on RDMA-enabled clusters with customized protocols , 2009, ICS.

[15] Chamath Indika Keppitiyagama. A network processor based message manager for MPI , 2000 .

[16] Amy W. Apon,et al. Implementation and design analysis of a network messaging module using virtual interface architecture , 2004, 2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935).

[17] S. Rixner,et al. An Event-driven Architecture for MPI Libraries , 2004 .