Communication Benchmarking and Performance Modelling of MPI Programs on Cluster Computers

This paper gives an overview of two related tools that we have developed to provide more accurate measurement and modelling of the performance of message-passing communication and application programs on distributed memory parallel computers. MPIBench uses a very precise, globally synchronised clock to measure the performance of MPI communication routines. It can generate probability distributions of communication times, not just the average values produced by other MPI benchmarks. This allows useful insights to be made into the MPI communication performance of parallel computers, and in particular how performance is affected by network contention. The Performance Evaluating Virtual Parallel Machine (PEVPM) provides a simple, fast and accurate technique for modelling and predicting the performance of message-passing parallel programs. It uses a virtual parallel machine to simulate the execution of the parallel program. The effects of network contention can be accurately modelled by sampling from the probability distributions generated by MPIBench. These tools are particularly useful on clusters with commodity Ethernet networks, where relatively high latencies, network congestion and TCP problems can significantly affect communication performance, which is difficult to model accurately using other tools. Experiments with example parallel programs demonstrate that PEVPM gives accurate performance predictions on commodity clusters. We also show that modelling communication performance using average times rather than sampling from probability distributions can give misleading results, particularly for programs running on a large number of processors.

[1]  Rajeev Alur,et al.  Model-Checking for Probabilistic Real-Time Systems (Extended Abstract) , 1991, ICALP.

[2]  Arjan J. C. van Gemund Performance Modeling of Parallel Systems , 1996 .

[3]  Jesús Labarta,et al.  DiP: A Parallel Program Development Environment , 1996, Euro-Par, Vol. II.

[4]  C. A. R. Hoare,et al.  Communicating sequential processes , 1978, CACM.

[5]  Ken A. Hawick,et al.  Commodity cluster computing for computational chemistry , 2000 .

[6]  Dieter Kranzlmüller,et al.  NOPE: A Nondeterministic Program Evaluator , 1999, ACPC.

[7]  Christopher J. Hughes,et al.  RSIM: Simulating Shared-Memory Multiprocessors with ILP Processors , 2002, Computer.

[8]  Steven Fortune,et al.  Parallelism in random access machines , 1978, STOC.

[9]  Duncan A. Grove,et al.  Precise MPI Performance Measurement Using MPIBench , 2001 .

[10]  H. Jonkers,et al.  Performance Analysis of Parallel Systems: A Hybrid Approach , 1995 .

[11]  Rajeev Alur,et al.  A Theory of Timed Automata , 1994, Theor. Comput. Sci..

[12]  Roger W. Hockney,et al.  Performance parameters and benchmarking of supercomputers , 1991, Parallel Comput..

[13]  Eleanor G. Hall Book Review: Education of the Gifted and Talented , 1986 .

[14]  William Gropp,et al.  Reproducible Measurements of MPI Performance Characteristics , 1999, PVM/MPI.

[15]  Vipin Kumar,et al.  Isoefficiency: measuring the scalability of parallel algorithms and architectures , 1993, IEEE Parallel & Distributed Technology: Systems & Applications.

[16]  Duncan A. Grove,et al.  Modeling message-passing programs with a Performance Evaluating Virtual Parallel Machine , 2005, Perform. Evaluation.

[17]  G. Amdhal,et al.  Validity of the single processor approach to achieving large scale computing capabilities , 1967, AFIPS '67 (Spring).

[18]  Anthony J. G. Hey,et al.  Realistic Parallel Performance Estimation , 1997, Parallel Comput..

[19]  Jesper Larsson Träff,et al.  SKaMPI: a comprehensive benchmark for public benchmarking of MPI , 2002, Sci. Program..

[20]  Manish Parashar,et al.  Interpretive performance prediction for high performance parallel computing , 1994 .

[21]  Kenneth A. Hawick,et al.  A Beowulf Cluster for Computational Chemistry , 2000, HPCN Europe.

[22]  Fredrik Larsson,et al.  Simics: A Full System Simulation Platform , 2002, Computer.

[23]  Robin Milner,et al.  A Calculus of Communicating Systems , 1980, Lecture Notes in Computer Science.

[24]  Vikram S. Adve,et al.  Analyzing the behavior and performance of parallel programs , 1993 .