A look at application performance sensitivity to the bandwidth and latency of InfiniBand networks

This work explores the expected performance of three applications on a high performance computing cluster interconnected using InfiniBand. In particular, the expected performance across a range of configurations is analyzed notably InfiniBand 4times, 8times and 12times representing link-speeds of 10 Gb/s, 20 Gb/s, and 30 Gb/s respectively as well as near-neighbor MPI message latencies of 4mus and 1.5mus. In addition we also consider the impact of node size, from one to eight processors that share a single network connection. The performance analysis is based on the use of detailed performance models of the three applications developed at Los Alamos. The results of the analysis show that the application performance can range by as much as 60% from best to worst. The relative importance of bandwidth, latency and node size differs between the applications

[1]  F. Petrini,et al.  The Case of the Missing Supercomputer Performance: Achieving Optimal Performance on the 8,192 Processors of ASCI Q , 2003, ACM/IEEE SC 2003 Conference (SC'03).

[2]  Adolfy Hoisie,et al.  Performance and Scalability Analysis of Teraflop-Scale Parallel Architectures Using Multidimensional Wavefront Applications , 2000, Int. J. High Perform. Comput. Appl..

[3]  Alexander V. Veidenbaum,et al.  Innovative Architecture for Future Generation High-Performance Processors and Systems , 2003, Innovative Architecture for Future Generation High-Performance Processors and Systems, 2003.

[4]  Robert Weaver,et al.  The RAGE radiation-hydrodynamic code , 2008 .

[5]  Amith R. Mamidala,et al.  Performance evaluation of InfiniBand with PCI Express , 2004, Proceedings. 12th Annual IEEE Symposium on High Performance Interconnects.

[6]  Dhabaleswar K. Panda,et al.  Microbenchmark performance comparison of high-speed cluster interconnects , 2004, IEEE Micro.

[7]  Fabrizio Petrini,et al.  Predictive Performance and Scalability Modeling of a Large-Scale Application , 2001, ACM/IEEE SC 2001 Conference (SC'01).

[8]  R. Baker A Block Adaptive Mesh Refinement Algorithm for the Neutral Particle Transport Equation , 2002 .

[9]  Adolfy Hoisie,et al.  Exploring advanced architectures using performance prediction , 2002, International Workshop on Innovative Architecture for Future Generation High-Performance Processors and Systems.

[10]  Amith R. Mamidala,et al.  Evaluating InfiniBand performance with PCI Express , 2005, IEEE Micro.

[11]  Adolfy Hoisie,et al.  A performance comparison between the Earth Simulator and other terascale systems on a characteristic ASCI workload , 2005, Concurr. Pract. Exp..