Assessing the Ability of Computation/Communication Overlap and Communication Progress in Modern Interconnects

Computation/communication overlap is one of the fundamental techniques in hiding communication latency. Independent progress support in messaging layer, network interface offload capability and application usage of non-blocking communications are believed to increase overlap and yield performance benefits. In this paper, we analyze four MPI implementations on top of three high-speed interconnects (InfiniBand, Myrinet and iWARP Ethernet) in their ability to support overlap and communication progress. The results confirm that the offload ability needs to be supported with communication progress to increase the level of overlap. Our progress engine micro-benchmark results show that in all examined networks transferring small messages makes an acceptable level of progress and overlap. On the other hand, in most cases, transferring large messages does not make progress independently, decreasing the chances of overlap in applications.

[1]  Dhabaleswar K. Panda,et al.  Performance Comparison of MPI Implementations over InfiniBand, Myrinet and Quadrics , 2003, ACM/IEEE SC 2003 Conference (SC'03).

[2]  Douglas Doerfler,et al.  Measuring MPI Send and Receive Overhead and Application Availability in High Performance Network Interfaces , 2006, PVM/MPI.

[3]  Sayantan Sur,et al.  RDMA read based rendezvous protocol for MPI over InfiniBand: design alternatives and benefits , 2006, PPoPP '06.

[4]  Ying Qian,et al.  An evaluation of the Myrinet/GM2 two-port networks , 2004, 29th Annual IEEE International Conference on Local Computer Networks.

[5]  Ahmad Afsahi,et al.  10-Gigabit iWARP Ethernet: Comparative Performance Analysis with InfiniBand and Myrinet-10G , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[6]  Dhabaleswar K. Panda,et al.  Performance characterization of a 10-Gigabit Ethernet TOE , 2005, 13th Symposium on High Performance Interconnects (HOTI'05).

[7]  Keith D. Underwood,et al.  A comparison of 4X InfiniBand and Quadrics Elan-4 technologies , 2004, 2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935).

[8]  David E. Bernholdt,et al.  A Performance Instrumentation Framework to Characterize Computation-Communication Overlap in Message-Passing Systems , 2006, 2006 IEEE International Conference on Cluster Computing.

[9]  Jeff Hilland RDMA Protocol Verbs Specification , 2003 .

[10]  Keith D. Underwood,et al.  Analyzing the Impact of Overlap, Offload, and Independent Progress for Message Passing Interface Applications , 2005, Int. J. High Perform. Comput. Appl..

[11]  Fabrizio Petrini,et al.  Performance Evaluation of the Quadrics Interconnection Network , 2001, Proceedings 15th International Parallel and Distributed Processing Symposium. IPDPS 2001.