Myrinet networks: a performance study

As network computing become commonplace, the interconnection networks and the communication system software become critical in achieving high performance. Thus, it is essential to systematically assess the features and performance of the new networks. Recently, Myricom has introduced a two-port "E-card" Myrinet/PCl-X interface. In this paper, we present the basic performance of its GM2.I messaging layer, as well as a set of microbenchmarks designed to assess the quality of MPI implementation on top of GM. These microbenchmarks measure the latency, bandwidth, intra-node performance, computation/communication overlap, parameters of the LogP model, buffer reuse impact, different traffic patterns, and collective communications. We have discovered that the MPI basic performance is close to those offered at the GM. We find that the host overhead is very small in our system. The Myrinet network is shown to be sensitive to the buffer reuse patterns. However, it provides opportunities for overlapping computation with communication. The Myrinet network is able to deliver up to 2000MB/s bandwidth for the permutation patterns.

[1]  Ramesh Subramonian,et al.  LogP: towards a realistic model of parallel computation , 1993, PPOPP '93.

[2]  Chris J. Scheiman,et al.  LogGP: incorporating long messages into the LogP model—one step closer towards a realistic model for parallel computation , 1995, SPAA '95.

[3]  Charles L. Seitz,et al.  Myrinet: A Gigabit-per-Second Local Area Network , 1995, IEEE Micro.

[4]  Reza Rooholamini,et al.  Architectural and Performance Evaluation of GigaNet and Myrinet Interconnects on Clusters of Small-Scale SMP Servers , 2000, ACM/IEEE SC 2000 Conference (SC'00).

[5]  Dhabaleswar K. Panda,et al.  Microbenchmark performance comparison of high-speed cluster interconnects , 2004, IEEE Micro.

[6]  Reza Zamani,et al.  Performance evaluation of the Sun Fire Link SMP clusters , 2006, Int. J. High Perform. Comput. Netw..

[7]  Fabrizio Petrini,et al.  Performance Evaluation of the Quadrics Interconnection Network , 2001, Proceedings 15th International Parallel and Distributed Processing Symposium. IPDPS 2001.

[8]  Anthony Skjellum,et al.  A High-Performance, Portable Implementation of the MPI Message Passing Interface Standard , 1996, Parallel Comput..

[9]  D.E. Culler,et al.  Effects Of Communication Latency, Overhead, And Bandwidth In A Cluster Architecture , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[10]  Kees Verstoep,et al.  Fast Measurement of LogP Parameters for Message Passing Platforms , 2000, IPDPS Workshops.

[11]  Dhabaleswar K. Panda,et al.  Performance Comparison of MPI Implementations over InfiniBand, Myrinet and Quadrics , 2003, ACM/IEEE SC 2003 Conference (SC'03).

[12]  Jason Duell,et al.  An evaluation of current high-performance networks , 2003, Proceedings International Parallel and Distributed Processing Symposium.

[13]  Michel Dubois,et al.  Performance Evaluation of the , 1995 .

[14]  Ariel Cohen A performance analysis of 4X InfiniBand data transfer operations , 2003, Proceedings International Parallel and Distributed Processing Symposium.