IBRMP: A Reliable Multicast Protocol for InfiniBand

Modern distributed applications in high-performance computing (HPC) fields often need to disseminate data efficiently from one cluster to an arbitrary number of others by using multicast techniques. InfiniBand, with its high-throughput, low latency and low overhead communications, has been increasingly adopted as an HPC cluster interconnection. Although Infini Band hardware multicast is efficient and scalable, it is based on Unreliable Data grams (UD) which cannot guarantee reliable data distribution. This makes Infini Band multicast not the best fit for modern distributed applications. This paper presents the design and implementation of a reliable multicast protocol for Infini Band (IBRMP). IBRMP is based on Infini Band unreliable hardware multicast, and utilizes Infini Band Reliable Connection (RC) to guarantee data delivery. According to our experiments, IBRMP takes full advantage of Infini Band multicast which reduces communication traffic significantly. In our testing environment, using IBRMP is up to five times faster than using only RC to disseminate data among a group of receivers. Compared to the MPIBcast, IBRMP is able to provide an equivalent low latency service in addition to its efficiency in large amount of data transmission.

[1]  Bala Rajagopalan Reliability and scaling issues in multicast communication , 1992, SIGCOMM '92.

[2]  Donald F. Towsley,et al.  A comparison of sender-initiated and receiver-initiated reliable multicast protocols , 1994, IEEE J. Sel. Areas Commun..

[3]  ZHANGLi-xia,et al.  A reliable multicast framework for light-weight sessions and application level framing , 1995 .

[4]  Sanjoy Paul,et al.  Reliable Multicast Transport Protocol (RMTP) , 1997, IEEE J. Sel. Areas Commun..

[5]  Donald F. Towsley,et al.  A Comparison of Sender-Initiated and Receiver-Initiated Reliable Multicast Protocols , 1997, IEEE J. Sel. Areas Commun..

[6]  Michael Mitzenmacher,et al.  A digital fountain approach to asynchronous reliable multicast , 2002, IEEE J. Sel. Areas Commun..

[7]  B. Cohen,et al.  Incentives Build Robustness in Bit-Torrent , 2003 .

[8]  Amith R. Mamidala,et al.  Fast and scalable MPI-level broadcast using InfiniBand's hardware multicast support , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[9]  Vinay S. Pai,et al.  Chainsaw: Eliminating Trees from Overlay Multicast , 2005, IPTPS.

[10]  Renato Recio,et al.  A Remote Direct Memory Access Protocol Specification , 2007, RFC.

[11]  Torsten Hoefler,et al.  A practically constant-time MPI Broadcast Algorithm for large-scale InfiniBand Clusters with Multicast , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[12]  Thilo Kielmann,et al.  Collective Receiver-Initiated Multicast for Grid Applications , 2011, IEEE Transactions on Parallel and Distributed Systems.

[13]  Malathi Veeraraghavan,et al.  A Reliable Message Multicast Transport Protocol for Virtual Circuits , 2012 .