Improving Communication Progress and Overlap in MPI Rendezvous Protocol over RDMA-enabled Interconnects

Overlapping computation with communication is a key technique to conceal the effect of communication latency on the performance of parallel applications. MPI is a widely used message passing standard for high performance computing. One of the most important factors in achieving a good level of overlap is the MPI ability to make progress on outstanding communication operations. In this paper, we address some of the communication progress shortcomings in the current polling and RDMA Read based Rendezvous protocol used for transferring large messages in MPI. We then propose a novel speculative Rendezvous protocol that uses RDMA Read and RDMA Write to effectively improve communication progress and consequently the overlap ability. Performance results based on a modified MPICH2 over 10-Gigabit iWARP Ethernet reveal a significant (80-100%) improvement in receiver side overlap and progress ability.

[1]  Ying Qian,et al.  RDMA-based and SMP-aware Multi-port All-Gather on Multi-rail QsNet^II SMP Clusters , 2007, 2007 International Conference on Parallel Processing (ICPP 2007).

[2]  Message P Forum,et al.  MPI: A Message-Passing Interface Standard , 1994 .

[3]  Fabrizio Petrini,et al.  Performance Evaluation of the Quadrics Interconnection Network , 2001, Proceedings 15th International Parallel and Distributed Processing Symposium. IPDPS 2001.

[4]  Amy W. Apon,et al.  Implementation and design analysis of a network messaging module using virtual interface architecture , 2004, 2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935).

[5]  J. Nieplocha,et al.  QSNET/sup II/: defining high-performance network design , 2005, IEEE Micro.

[6]  H.H.J. Hum,et al.  Polling Watchdog: Combining Polling and Interrupts for Efficient Message Handling , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).

[7]  Dhabaleswar K. Panda,et al.  Performance Comparison of MPI Implementations over InfiniBand, Myrinet and Quadrics , 2003, ACM/IEEE SC 2003 Conference (SC'03).

[8]  Sushmitha P. Kini,et al.  Performance Comparison of MPI Implementations over InfiniBand, Myrinet and Quadrics , 2003, ACM/IEEE SC 2003 Conference (SC'03).

[9]  Douglas Doerfler,et al.  Measuring MPI Send and Receive Overhead and Application Availability in High Performance Network Interfaces , 2006, PVM/MPI.

[10]  Sayantan Sur,et al.  RDMA read based rendezvous protocol for MPI over InfiniBand: design alternatives and benefits , 2006, PPoPP '06.

[11]  Ying Qian,et al.  An evaluation of the Myrinet/GM2 two-port networks , 2004, 29th Annual IEEE International Conference on Local Computer Networks.

[12]  Ahmad Afsahi,et al.  Assessing the Ability of Computation/Communication Overlap and Communication Progress in Modern Interconnects , 2007, 15th Annual IEEE Symposium on High-Performance Interconnects (HOTI 2007).

[13]  S. Rixner,et al.  An Event-driven Architecture for MPI Libraries , 2004 .

[14]  Ahmad Afsahi,et al.  10-Gigabit iWARP Ethernet: Comparative Performance Analysis with InfiniBand and Myrinet-10G , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[15]  Nectarios Koziris,et al.  Minimizing completion time for loop tiling with computation and communication overlapping , 2001, Proceedings 15th International Parallel and Distributed Processing Symposium. IPDPS 2001.

[16]  D. K. Panda InfiniBand Architecture , 2001 .

[17]  Keith D. Underwood,et al.  A comparison of 4X InfiniBand and Quadrics Elan-4 technologies , 2004, 2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935).

[18]  A. Dickinson CMOS Photonics - Bringing Moore's Law to Optical Interconnect , 2007 .

[19]  Message Passing Interface Forum MPI: A message - passing interface standard , 1994 .

[20]  Keith D. Underwood,et al.  Analyzing the Impact of Overlap, Offload, and Independent Progress for Message Passing Interface Applications , 2005, Int. J. High Perform. Comput. Appl..

[21]  Kenichi Hayashi,et al.  An MPI library which uses polling, interrupts and remote copying for the Fujitsu AP1000+ , 1996, Proceedings Second International Symposium on Parallel Architectures, Algorithms, and Networks (I-SPAN'96).