Using CMT in SCTP-Based MPI to Exploit Multiple Interfaces in Cluster Nodes

Many existing clusters use inexpensive Gigabit Ethernet and often have multiple interfaces cards to improve bandwidth and enhance fault tolerance. We investigate the use of Concurrent Multipath Transfer (CMT), an extension to the Stream Control Transmission Protocol (SCTP), to take advantage of multiple network interfaces for use with MPI programs. We evaluate the performance of our system with microbenchmarks and MPI collective routines. We also compare our method, which employs CMT at the transport layer in the operating system kernel, to existing systems that support multi-railing in the middleware. We discuss performance with respect to bandwidth, latency, congestion control and fault tolerance.

[1]  P.D. Amer,et al.  Concurrent Multipath Transfer using Transport Layer Multihoming: Performance Under Network Failures , 2006, MILCOM 2006 - 2006 IEEE Military Communications conference.

[2]  Jameela Al-Jaroodi,et al.  High-performance message striping over reliable transport protocols , 2006, The Journal of Supercomputing.

[3]  Mitsuhisa Sato,et al.  RI2N/UDP: High bandwidth and fault-tolerant network for a PC-cluster based on multi-link Ethernet , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[4]  Dhabaleswar K. Panda,et al.  Building Multirail InfiniBand Clusters: MPI-Level Design and Performance Evaluation , 2004, Proceedings of the ACM/IEEE SC2004 Conference.

[5]  George Bosilca,et al.  Open MPI's TEG Point-to-Point Communications Methodology: Comparison to Existing Implementations , 2004, PVM/MPI.

[6]  Michael Tüxen,et al.  Stream Control Transmission Protocol (SCTP) Specification Errata and Issues , 2006, RFC.

[7]  Alan Wagner,et al.  SCTP versus TCP for MPI , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[8]  George Bosilca,et al.  Open MPI: Goals, Concept, and Design of a Next Generation MPI Implementation , 2004, PVM/MPI.

[9]  Qiaobing Xie,et al.  Stream control transmission protocol (SCTP): a reference guide , 2001 .

[10]  Abhinav Vishnu,et al.  A Software Based Approach for Providing Network Fault Tolerance in Clusters with uDAPL interface: MPI Level Design and Performance Evaluation , 2006, ACM/IEEE SC 2006 Conference (SC'06).

[11]  Amith R. Mamidala,et al.  Scalable systems software - A software based approach for providing network fault tolerance in clusters with uDAPL interface: MPI level design and performance evaluation , 2006, SC.

[12]  Paul D. Amer,et al.  End-to-end fault tolerance using transport layer multihoming , 2005 .

[13]  Randall R. Stewart,et al.  Stream Control Transmission Protocol , 2000, RFC.

[14]  Janardhan R. Iyengar,et al.  Concurrent Multipath Transfer Using SCTP Multihoming Over Independent End-to-End Paths , 2006, IEEE/ACM Transactions on Networking.

[15]  Paul D. Amer,et al.  End-to-end concurrent multipath transfer using transport layer multihoming , 2006 .

[16]  Guillaume Mercier,et al.  High-Performance Multi-Rail Support with the NEWMADELEINE Communication Library , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[17]  Sherali Zeadally,et al.  Stream Control Transmission Protocol (SCTP) , 2008 .