Performance evaluation of InfiniBand with PCI Express

We present an initial performance evaluation of InfiniBand HCAs (host channel adapters) from Mellanox with PCI Express interfaces. We compare the performance with HCAs using PCI-X interfaces. Our results show that InfiniBand HCAs with PCI Express can achieve significant performance benefits. Compared with HCAs using 64 bit/133 MHz PCI-X interfaces, they can achieve 20%-30% lower latency for small messages. The small message latency achieved with PCI Express is around 3.8 /spl mu/s, compared with the 5.0 /spl mu/s with PCI-X. For large messages, HCAs with PCI Express using a single port can deliver unidirectional bandwidth up to 968 MB/s and bidirectional bandwidth up to 1916 MB/s, which are, respectively, 1.24 and 2.02 times the peak bandwidths achieved by HCAs with PCI-X. When both the ports of the HCAs are activated, HCAs with PCI Express can deliver a peak unidirectional bandwidth of 1486 MB/s and aggregate bidirectional bandwidth up to 2729 MB/s, which are 1.93 and 2.88 times the peak bandwidths obtained using HCAs with PCI-X. PCI Express also improves performance at the MPI level. A latency of 4.6 /spl mu/s with PCI Express is achieved for small messages. And for large messages, unidirectional bandwidth of 1497 MB/s and bidirectional bandwidth of 2724 MB/s are observed.

[1]  Wu-chun Feng,et al.  The Quadrics Network: High-Performance Clustering Technology , 2002, IEEE Micro.

[2]  Jason Duell,et al.  An evaluation of current high-performance networks , 2003, Proceedings International Parallel and Distributed Processing Symposium.

[3]  Dhabaleswar K. Panda,et al.  Building Multirail InfiniBand Clusters: MPI-Level Design and Performance Evaluation , 2004, Proceedings of the ACM/IEEE SC2004 Conference.

[4]  Dhabaleswar K. Panda,et al.  Performance Comparison of MPI Implementations over InfiniBand, Myrinet and Quadrics , 2003, ACM/IEEE SC 2003 Conference (SC'03).

[5]  Wu-chun Feng,et al.  End-to-end performance of 10-gigabit Ethernet on commodity systems , 2004, IEEE Micro.

[6]  Samuel A. Fineberg,et al.  Using MPI-Portable Parallel Programming with the Message-Passing Interface, by William Gropp , 1996 .

[7]  Dhabaleswar K. Panda,et al.  MIBA: A Micro-Benchmark Suite for Evaluating InfiniBand Architecture Implementations , 2003, Computer Performance Evaluation / TOOLS.

[8]  Dhabaleswar K. Panda,et al.  Microbenchmark performance comparison of high-speed cluster interconnects , 2004, IEEE Micro.

[9]  Greg J. Regnier,et al.  The Virtual Interface Architecture , 2002, IEEE Micro.

[10]  Dhabaleswar K. Panda,et al.  Micro-benchmark level performance comparison of high-speed cluster interconnects , 2003, 11th Symposium on High Performance Interconnects, 2003. Proceedings..

[11]  Dhabaleswar K. Panda,et al.  VIBe: a micro-benchmark suite for evaluating virtual interface architecture (VIA) implementations , 2001, Proceedings 15th International Parallel and Distributed Processing Symposium. IPDPS 2001.

[12]  Dhabaleswar K. Panda,et al.  High performance RDMA-based MPI implementation over InfiniBand , 2003, ICS.