uDAPL is portable and platform independent communication library, which provides RDMA as well as send/recv operations. Some well known software has attempted to take advantage of uDAPL's portability, such as Open MPI, MVAPICH2, Intel MPI, and Cluster OpenMP. However, network performance is still the bottleneck for those software. Engaging "Multirail" network is a method to by-pass it. In this paper, we have designed a non-threaded and a threaded approaches to improve performance of uDAPL over multirail configured clusters. The two approaches will be evaluated on different InfiniBand multirail configured clusters. The results shows that threaded approach improves 33% and 148% of the uni-directional bandwidth on the multi-port and the multi-HCA configured network respectively, and the non-threaded approach improves ~90% of the uni-directional bandwidth on the multi-HCA configured network. A similar improvements have been achieved for the bi-directional bandwidth.
[1]
Wei Huang,et al.
Supporting MPI-2 One Sided Communication on Multi-rail InfiniBand Clusters: Design Challenges and Performance Benefits
,
2005,
HiPC.
[2]
Dhabaleswar K. Panda,et al.
MPI over uDAPL: Can High Performance and Portability Exist Across Architectures?
,
2006,
Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGRID'06).
[3]
Dhabaleswar K. Panda,et al.
Building Multirail InfiniBand Clusters: MPI-Level Design and Performance Evaluation
,
2004,
Proceedings of the ACM/IEEE SC2004 Conference.
[4]
Fabrizio Petrini,et al.
Using multirail networks in high-performance clusters
,
2001,
Proceedings 42nd IEEE Symposium on Foundations of Computer Science.