Can NIC Memory in InfiniBand Benefit Communication Performance? — A Study with Mellanox Adapter

This paper presents a comprehensive micro-benchmark performance evaluation on using NIC memory in the Mellanox InfiniBand adapter. Three main benefits have been explored, including non-blocking and high performance host/NIC data movement, traffic reduction of the local interconnect, and avoidance of the local interconnect bottleneck. Two case studies have been carried out to show how these benefits can be utilized by applications. In the first case in which the NIC memory is used as intermediate communication buffer for non-contiguous data communication, lower CPU overhead and better latency are attained. In the second case, a common communication building block, communication forwarding chain, has been studied. Our results show that using the NIC memory can achieve a factor of up to 2.2 improvement over the conventional approach. To the best of our knowledge, this is the first such study to demonstrate the benefits of NIC memory in InfiniBand adapter.

[1]  Dhabaleswar K. Panda,et al.  Microbenchmark performance comparison of high-speed cluster interconnects , 2004, IEEE Micro.

[2]  Fabrizio Petrini,et al.  Performance Evaluation of the Quadrics Interconnection Network , 2001, Proceedings 15th International Parallel and Distributed Processing Symposium. IPDPS 2001.

[3]  Dhabaleswar K. Panda,et al.  Fast NIC-based barrier over Myrinet/GM , 2001, Proceedings 15th International Parallel and Distributed Processing Symposium. IPDPS 2001.

[4]  Dhabaleswar K. Panda,et al.  High performance and reliable NIC-based multicast over Myrinet/GM-2 , 2003, 2003 International Conference on Parallel Processing, 2003. Proceedings..

[5]  OPIOM: Off-Processor I/O with Myrinet , 2001, Future Gener. Comput. Syst..

[6]  Wu-chun Feng,et al.  The Quadrics Network: High-Performance Clustering Technology , 2002, IEEE Micro.

[7]  Jeffrey S. Chase,et al.  Payload Caching: High-Speed Data Forwarding for Network Intermediaries , 2001, USENIX ATC, General Track.

[8]  Robert B. Ross,et al.  Unifier: unifying cache management and communication buffer management for PVFS over InfiniBand , 2004, IEEE International Symposium on Cluster Computing and the Grid, 2004. CCGrid 2004..

[9]  Scott Rixner,et al.  Increasing web server throughput with network interface data caching , 2002, ASPLOS X.

[10]  Venkata Krishnan,et al.  PCI express and advanced switching: evolutionary path to building next generation interconnects , 2003, 11th Symposium on High Performance Interconnects, 2003. Proceedings..

[11]  Rajeev Thakur,et al.  Improving the Performance of Collective Operations in MPICH , 2003, PVM/MPI.

[12]  Anthony Skjellum,et al.  A High-Performance, Portable Implementation of the MPI Message Passing Interface Standard , 1996, Parallel Comput..