论文信息 - Performance of HPC Middleware over InfiniBand WAN

Performance of HPC Middleware over InfiniBand WAN

High performance interconnects such as InfiniBand (IB)have enabled large scale deployments of High Performance Computing (HPC) systems. High performance communication and IO middleware such as MPI and NFS over RDMA have also been redesigned to leverage the performance of these modern interconnects. With the advent of long haul InfiniBand (IB WAN), IB applications now have inter-cluster reaches. While this technology is intended to enable high performance network connectivity across WAN links,it is important to study and characterize the actual performance that the existing IB middleware achieve in these emerging IB WAN scenarios. In this paper, we study and analyze the performance characteristics of the following three HPC middleware: (i)IPoIB (IP traffic over IB), (ii) MPI and (iii) NFS over RDMA. We utilize the Obsidian IB WAN routers for inter-cluster connectivity. Our results show that many of the applications absorb smaller network delays fairly well. However, most approaches get severely impacted in high delay scenarios. Further, communication protocols need to be optimized in higher delay scenarios to improve the performance. In this paper, we propose several such optimizations to improve communication performance. Our experimental results show that techniques such as WAN-aware protocols, transferring data using large messages (message coalescing) and using parallel data streams can improve the communication performance (up to 50%) in high delay scenarios. Overall, these results demonstrate that IB WAN technologies can enable cluster-of-clusters architecture as a feasible platform for HPC systems.

Dhabaleswar K. Panda | Hari Subramoni | Ping Lai | Sundeep Narravula | Ranjit Noronha

[1] David H. Bailey,et al. NAS parallel benchmark results , 1993, IEEE Parallel & Distributed Technology: Systems & Applications.

[2] Nageswara S. V. Rao,et al. Experimental evaluation of infiniband transport over local- and wide-area networks , 2007, SpringSim '07.

[3] Corporate The MPI Forum,et al. MPI: a message passing interface , 1993, Supercomputing '93.

[4] Forum Mpi. MPI: A Message-Passing Interface , 1994 .

[5] James C. Hoe,et al. MPI-StarT: Delivering Network Performance to Numerical Applications , 1998, Proceedings of the IEEE/ACM SC98 Conference.

[6] Dhabaleswar K. Panda,et al. High Performance RDMA-Based MPI Implementation over InfiniBand , 2003, ICS '03.

[7] Dhabaleswar K. Panda,et al. Designing NFS with RDMA for Security, Performance and Scalability , 2007, 2007 International Conference on Parallel Processing (ICPP 2007).