论文信息 - High Performance RDMA Protocols in HPC

High Performance RDMA Protocols in HPC

Modern network communication libraries that leverage Remote Directory Memory Access (RDMA) and OS bypass protocols, such as Infiniband [2] and Myrinet [10] can offer significant performance advantages over conventional send/receive protocols. However, this performance often comes with hidden per buffer setup costs [4]. This paper describes a unique long-message MPI [9] library ‘pipeline' protocol that addresses these constraints while avoiding some of the pitfalls of existing techniques. By using portable send/receive semantics to hide the cost of initializing the pipeline algorithm, and then effectively overlapping the cost of memory registration with RDMA operations, this protocol provides very good performance for any large-memory usage pattern. This approach avoids the use of non-portable memory hooks or keeping registered memory from being returned to the OS. Through this approach, bandwidth may be increased up to 67% when memory buffers are not effectively reused while providing superior performance in the effective bandwidth benchmark. Several user level protocols are explored using Open MPI's PML (Point to point messaging layer) and compared/contrasted to this ‘pipeline' protocol.

[1] Forum Mpi. MPI: A Message-Passing Interface , 1994 .

[2] William Gropp,et al. MPI-2: Extending the Message-Passing Interface , 1996, Euro-Par, Vol. I.

[3] Ronald B. Brightwell,et al. Scalability limitations of VIA-based technologies in supporting MPI , 2000 .

[4] Rolf Rabenseifner,et al. The Parallel Communication and I/O Bandwidth Benchmarks: b eff and b eff io , 2001 .

[5] Jack J. Dongarra,et al. HARNESS and fault tolerant MPI , 2001, Parallel Comput..

[6] Rolf Rabenseifner,et al. Effective Communication and File-I/O Bandwidth Benchmarks , 2001, PVM/MPI.

[7] Dhabaleswar K. Panda,et al. High performance RDMA-based MPI implementation over InfiniBand , 2003, ICS.

[8] Hemal Shah,et al. A study of iSCSI extensions for RDMA (iSER) , 2003, NICELI '03.

[9] Andrew Lumsdaine,et al. A Component Architecture for LAM/MPI , 2003, PVM/MPI.

[10] Ronald Minnich,et al. A Network-Failure-Tolerant Message-Passing System for Terascale Clusters , 2002, ICS '02.

[11] George Bosilca,et al. Open MPI: Goals, Concept, and Design of a Next Generation MPI Implementation , 2004, PVM/MPI.

[12] Dhabaleswar K. Panda,et al. Host-assisted zero-copy remote memory access communication on InfiniBand , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[13] Michael M. Resch,et al. Towards Efficient Execution of MPI Applications on the Grid: Porting and Optimization Issues , 2003, Journal of Grid Computing.