A Contention-Aware Performance Model for HPC-Based Networks: A Case Study of the InfiniBand Network

Multi-core clusters are cost-effective clusters largely used in high-performance computing. Parallel applications using message passing as a communication mechanism may introduce complex communication behaviours on such clusters. By sending and receiving data simultaneously from and to several nodes, parallel applications create concurrent accesses to the resources of the network. In this paper, we present a general model that expresses network resource sharing characterised by a dynamic contention graph. The model is based on a linear system weighted by bandwidth distribution factors called penalty coefficients that are specific to a network technology. We propose a method to solve the linear system and present an analysis to determine penalty coefficients on InfiniBand technology. We use complex network conflicts to assess the ability of the model to predict with low errors.

[1]  Roger W. Hockney,et al.  The Communication Challenge for MPP: Intel Paragon and Meiko CS-2 , 1994, Parallel Computing.

[2]  Csaba Andras Moritz,et al.  LoGPC: Modeling Network Contention in Message-Passing Programs , 2001, IEEE Trans. Parallel Distributed Syst..

[3]  Denis Trystram,et al.  Assessing Contention Effects on MPI_Alltoall Communications , 2007, GPC.

[4]  Laurence T. Yang,et al.  Advances in Grid and Pervasive Computing, Third International Conference, GPC 2008, Kunming, China, May 25-28, 2008. Proceedings , 2008, GPC.

[5]  W HockneyRoger The communication challenge for MPP , 1994 .

[6]  Kees Verstoep,et al.  Fast Measurement of LogP Parameters for Message Passing Platforms , 2000, IPDPS Workshops.

[7]  Bernd Mohr,et al.  The Scalasca performance toolset architecture , 2010 .

[8]  Torsten Hoefler,et al.  Multistage switches are not crossbars: Effects of static routing in high-performance networks , 2008, 2008 IEEE International Conference on Cluster Computing.

[9]  Sang Cheol Kim,et al.  Measurement and Prediction of Communication Delays in Myrinet Networks , 2001, J. Parallel Distributed Comput..

[10]  Henri Casanova,et al.  SimGrid: A Generic Framework for Large-Scale Distributed Experiments , 2008, Tenth International Conference on Computer Modeling and Simulation (uksim 2008).

[11]  Ramesh Subramonian,et al.  LogP: a practical model of parallel computation , 1996, CACM.

[12]  Paul D. Gader,et al.  Image algebra techniques for parallel image processing , 1987 .

[13]  Torsten Hoefler,et al.  LogfP - a model for small messages in InfiniBand , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[14]  Jean-François Méhaut,et al.  Model of concurrent MPI communications over SMP clusters , 2006 .