Performance Evaluation of MM5 on Clusters with Modern Interconnects: Scalability and Impact

Clusters have become a crucial technology for providing low-cost high performance computing to scientific applications like weather prediction. In addition, networks like Myrinet, InfiniBand and Quadrics have become popular as an interconnection technology for high performance clusters. The high-bandwidth, low-latency characteristics of these networks make them ideally suited to the demanding characteristics of large scale weather simulations. Additionally, these networks have features like efficient and scalable hardware broadcast, reduce and atomic operations. Some of the features have been integrated into the MPI stack for these networks, allowing the user to exploit them for improved performance. In this paper, we evaluate the communication characteristics of a popular weather simulation code MM5 using InfiniBand. We also investigate how special features of InfiniBand like scalable broadcast can benefit MM5 performance. For some workloads, we see that InfiniBand performs up to 34% better than other interconnects. It also performs better in general than other networks for all workloads.

[1]  G. Grell,et al.  A description of the fifth-generation Penn State/NCAR Mesoscale Model (MM5) , 1994 .

[2]  Dhabaleswar K. Panda,et al.  Micro-benchmark level performance comparison of high-speed cluster interconnects , 2003, 11th Symposium on High Performance Interconnects, 2003. Proceedings..

[3]  Dhabaleswar K. Panda,et al.  Microbenchmark performance comparison of high-speed cluster interconnects , 2004, IEEE Micro.

[4]  Anthony Skjellum,et al.  A High-Performance, Portable Implementation of the MPI Message Passing Interface Standard , 1996, Parallel Comput..

[5]  J. Michalakes,et al.  Runtime system library for parallel finite difference models with nesting , 1997 .

[6]  Amith R. Mamidala,et al.  Fast and scalable MPI-level broadcast using InfiniBand's hardware multicast support , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[7]  Nael B. Abu-Ghazaleh,et al.  Using programmable NICs for time-warp optimization , 2002, Proceedings 16th International Parallel and Distributed Processing Symposium.

[8]  Ravi S. Nanjundiah,et al.  Parallel implementation, validation, and performance of MM5 , 1994 .

[9]  Keith D. Underwood,et al.  A comparison of 4X InfiniBand and Quadrics Elan-4 technologies , 2004, 2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935).

[10]  Charles L. Seitz,et al.  Myrinet: A Gigabit-per-Second Local Area Network , 1995, IEEE Micro.