Comparing Ethernet and Myrinet for MPI communication

This paper compares the performance of Myrinet and Ethernet as a communication substrate for MPI libraries. MPI library implementations for Myrinet utilize user-level communication protocols to provide low latency and high bandwidth MPI messaging. In contrast, MPI library implementations for Ethernet utilize the operating system network protocol stack, leading to higher message latency and lower message bandwidth. However, on the NAS benchmarks, GM messaging over Myrinet only achieves 5% higher application performance than TCP messaging over Ethernet. Furthermore, efficient TCP messaging implmentations improve communication latency tolerance, which closes the performance gap between Myrinet and Ethernet to about 0.3% on the NAS benchmarks. This shows that commodity networking, if used efficiently, can be a viable alternative to specialized networking for high-performance message passing.

[1]  Jeffrey S. Chase,et al.  End system optimizations for high-speed TCP , 2001, IEEE Commun. Mag..

[2]  Dhabaleswar K. Panda,et al.  Can user-level protocols take advantage of multi-CPU NICs? , 2002, Proceedings 16th International Parallel and Distributed Processing Symposium.

[3]  Scott Rixner,et al.  Isolating the performance impacts of network interface cards through microbenchmarks , 2004, SIGMETRICS '04/Performance '04.

[4]  David H. Bailey,et al.  The Nas Parallel Benchmarks , 1991, Int. J. High Perform. Comput. Appl..

[5]  Jeffrey S. Chase,et al.  Trapeze / IP : TCP / IP at Near-Gigabit Speeds , 1999 .

[6]  S. Rixner,et al.  An Event-driven Architecture for MPI Libraries , 2004 .

[7]  Ravishankar K. Iyer,et al.  Addressing TCP/IP processing challenges using the IA and IXP processors , 2003 .

[8]  P. Wyckoff,et al.  EMP: Zero-Copy OS-Bypass NIC-Driven Gigabit Ethernet Message Passing , 2001, ACM/IEEE SC 2001 Conference (SC'01).

[9]  Srihari Makineni,et al.  Architectural characterization of TCP/IP packet processing on the Pentium/spl reg/ M microprocessor , 2004, 10th International Symposium on High Performance Computer Architecture (HPCA'04).

[10]  Wu-chun Feng,et al.  The Quadrics Network: High-Performance Clustering Technology , 2002, IEEE Micro.

[11]  Sriram R. Vangal,et al.  A TCP offload accelerator for 10 Gb/s Ethernet in 90-nm CMOS , 2003, IEEE J. Solid State Circuits.

[12]  Anthony Skjellum,et al.  A High-Performance, Portable Implementation of the MPI Message Passing Interface Standard , 1996, Parallel Comput..

[13]  Ronald Minnich,et al.  A network-failure-tolerant message-passing system for terascale clusters , 2002, ICS '02.

[14]  David E. Culler,et al.  High-performance local area communication with fast sockets , 1997 .

[15]  Arthur B. Maccabe,et al.  Making TCP Viable as a High Performance Computing Protocol , 2002 .

[16]  Yitzhak Birk,et al.  Deferred segmentation for wire-speed transmission of large TCP frames over standard GbE networks , 2001, HOT 9 Interconnects. Symposium on High Performance Interconnects.

[17]  Charles L. Seitz,et al.  Myrinet: A Gigabit-per-Second Local Area Network , 1995, IEEE Micro.