Building the communication performance model of heterogeneous clusters based on a switched network

Analytical communication performance models play an important role in prediction of the execution time of parallel applications on multiprocessors. Apart from designing such a model, accurate estimation of the values of its parameters is one of the main issues. This paper deals with a heterogeneous analytical communication model designed for prediction of MPI communications on heterogeneous clusters based on a switched network. Accurate estimation of the parameters of this model is a particularly challenging task due to a large number of the parameters. In this paper, we present a solution of the task based on a carefully designed set of communication experiments, which not only allows us to obtain the accurate estimation of the parameters but also tries to minimise the total execution time of the experiments. Experiments demonstrating the accuracy and efficiency of the proposed solution are also presented.

[1]  Werner Augustin,et al.  On Benchmarking Collective MPI Operations , 2002, PVM/MPI.

[2]  Ramesh Subramonian,et al.  LogP: towards a realistic model of parallel computation , 1993, PPOPP '93.

[3]  Chris J. Scheiman,et al.  LogGP: incorporating long messages into the LogP model—one step closer towards a realistic model for parallel computation , 1995, SPAA '95.

[4]  Kees Verstoep,et al.  Fast Measurement of LogP Parameters for Message Passing Platforms , 2000, IPDPS Workshops.

[5]  Rajeev Thakur,et al.  Optimization of Collective Communication Operations in MPICH , 2005, Int. J. High Perform. Comput. Appl..

[6]  Alexey L. Lastovetsky,et al.  An accurate communication model of a heterogeneous cluster based on a switch-enabled Ethernet network , 2006, 12th International Conference on Parallel and Distributed Systems - (ICPADS'06).

[7]  Alexey L. Lastovetsky,et al.  Adaptive parallel computing on heterogeneous networks with mpC , 2002, Parallel Comput..

[8]  Alexey L. Lastovetsky,et al.  A Performance Model of Many-to-One Collective Communications for Parallel Computing , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[9]  Alexey L. Lastovetsky,et al.  HeteroMPI: Towards a message-passing library for heterogeneous networks of computers , 2006, J. Parallel Distributed Comput..

[10]  Roger W. Hockney,et al.  The Communication Challenge for MPP: Intel Paragon and Meiko CS-2 , 1994, Parallel Computing.

[11]  Jack J. Dongarra,et al.  Performance analysis of MPI collective operations , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.

[12]  Henri Casanova,et al.  Innovations of the NetSolve Grid Computing System , 2002, Concurr. Comput. Pract. Exp..

[13]  Henri E. Bal,et al.  MagPIe: MPI's collective communication operations for clustered wide area systems , 1999, PPoPP '99.

[14]  W HockneyRoger The communication challenge for MPP , 1994 .