High Performance Relay Mechanism for MPI Communication Libraries Run on Multiple Private IP Address Clusters

We have been developing a Grid-enabled MPI communication library called GridMPI, which is designed to run on multiple clusters connected to a wide-area network. Some of these clusters may use private IP addresses. Therefore, some mechanism to enable communication between private IP address clusters is required. Such a mechanism should be widely adoptable, and should provide high communication performance. In this paper, we propose a message relay mechanism to support private IP address clusters in the manner of the Interoperable MPI (IMPI) standard. Therefore, any MPI implementations which follow the IMPI standard can communicate with the relay. Furthermore, we also propose a trunking method in which multiple pairs of relay nodes simultaneously communicate between clusters to improve the available communication bandwidth. While the relay mechanism introduces an one-way latency of about 25 musec, the extra overhead is negligible, since the communication latency through a wide area network is a few hundred times as large as this. By using trunking, the inter-cluster communication bandwidth can improve as the number of trunks increases. We confirmed the effectiveness of the proposed method by experiments using a 10 Gbps emulated WAN environment. When relay nodes with 1 Gbps NICs are used, the performance of most of the NAS Parallel Benchmarks improved proportional to the number of trunks. Especially, using 8 trunks, FT and IS are 4.4 and 3.4 times faster, respectively, compared with the single trunk case. The results showed that the proposed method is effective for running MPI programs over high bandwidth-delay product networks.

[1]  Oh-Young Kwon,et al.  MPICH-GP: A Private-IP-Enabled MPI Over Grid Environments , 2004, ISPA.

[2]  Gabriel Montenegro,et al.  Performance Enhancing Proxies Intended to Mitigate Link-Related Degradations , 2001, RFC.

[3]  Toshiyuki Imamura,et al.  An Architecture of Stampi: MPI Library on a Cluster of Parallel Computers , 2000, PVM/MPI.

[4]  William L. George,et al.  IMPI: Making MPI Interoperable , 2000, Journal of research of the National Institute of Standards and Technology.

[5]  Yuetsu Kodama,et al.  Effects of packet pacing for MPI programs in a Grid environment , 2007, 2007 IEEE International Conference on Cluster Computing.

[6]  Saikat Guha,et al.  Characterization and measurement of TCP traversal through NATs and firewalls , 2005, IMC '05.

[7]  Ian T. Foster,et al.  MPICH-G2: A Grid-enabled implementation of the Message Passing Interface , 2002, J. Parallel Distributed Comput..

[8]  Michael M. Resch,et al.  Implementing MPI with Optimized Algorithms for Metacomputing , 1999 .

[9]  Hideo Saito,et al.  Locality-aware Connection Management and Rank Assignment forWide-area MPI , 2007, CCGRID.

[10]  Yuetsu Kodama,et al.  GNET-1: gigabit Ethernet network testbed , 2004, 2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935).

[11]  Hideo Saito,et al.  Locality-aware Connection Management and Rank Assignment forWide-area MPI , 2007, Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGrid '07).

[12]  Anthony Skjellum,et al.  High Productivity MPI - Grid, Multi-Cluster, and Embedded System Extensions , 2004 .