An MPI Tool to Measure Application Sensitivity to Variation in Communication Parameters

This work describes an apparatus which can be used to vary communication performance parameters for MPI applications, and provides a tool to analyze the impact of communication performance on parallel applications. Our tool is based on Myrinet (along with GM). We use an extension of the LogP model to allow greater flexibility in determining the parameter(s) to which parallel applications may be sensitive. We show that individual communication parameters can be independently controlled within a small percentage error. We also present the results of using our tool on a suite of parallel benchmarks.

[1]  Sanguthevar Rajasekaran Randomized Selection on the Hypercube , 1996, J. Parallel Distributed Comput..

[2]  Ramesh Subramonian,et al.  LogP: towards a realistic model of parallel computation , 1993, PPOPP '93.

[3]  David A. Bader,et al.  Practical parallel algorithms for dynamic data redistribution, median finding, and selection , 1995, Proceedings of International Conference on Parallel Processing.

[4]  Chris J. Scheiman,et al.  LogGP: incorporating long messages into the LogP model—one step closer towards a realistic model for parallel computation , 1995, SPAA '95.

[5]  John N. Shadid,et al.  Official Aztec user''s guide: version 2.1 , 1999 .

[6]  Charles L. Seitz,et al.  Myrinet: A Gigabit-per-Second Local Area Network , 1995, IEEE Micro.

[7]  Steven J. Plimpton,et al.  Parallel Molecular Dynamics With the Embedded Atom Method , 1992 .

[8]  Henri E. Bal,et al.  Bandwidth-efficient collective communication for clustered wide area systems , 2000, Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000.

[9]  Mario Lauria,et al.  Cross-Platform Analysis of Fast Messages for Myrinet , 1998, CANPC.

[10]  D.E. Culler,et al.  Effects Of Communication Latency, Overhead, And Bandwidth In A Cluster Architecture , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[11]  Adolfy Hoisie,et al.  Exploring advanced architectures using performance prediction , 2002, International Workshop on Innovative Architecture for Future Generation High-Performance Processors and Systems.

[12]  David A. Bader,et al.  Practical parallel algorithms for personalized communication and integer sorting , 1996, JEAL.

[13]  Maurice Yarrow,et al.  New Implementations and Results for the NAS Parallel Benchmarks 2 , 1997, PPSC.

[14]  Steven G. Johnson,et al.  FFTW: an adaptive software architecture for the FFT , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[15]  Anthony Skjellum,et al.  A High-Performance, Portable Implementation of the MPI Message Passing Interface Standard , 1996, Parallel Comput..

[16]  Kees Verstoep,et al.  Fast Measurement of LogP Parameters for Message Passing Platforms , 2000, IPDPS Workshops.

[17]  John L. Hennessy,et al.  The Effects of Latency, Occupancy, and Bandwidth in Distributed Shared Memory Multiprocessors , 1995 .

[18]  Robert D. Falgout,et al.  Semicoarsening Multigrid on Distributed Memory Machines , 1999, SIAM J. Sci. Comput..

[19]  John N. Shadid,et al.  Parallel performance of a preconditioned CG solver for unstructured finite element applications , 1994 .

[20]  Steve Plimpton,et al.  Fast parallel algorithms for short-range molecular dynamics , 1993 .

[21]  John N. Shadid,et al.  Parallel sparse matrix vector multiply software for matrices with data locality , 1998, Concurr. Pract. Exp..

[22]  Rod A. Fatoohi,et al.  Performance evaluation of three distributed computing environments for scientific applications , 1994, Proceedings of Supercomputing '94.

[23]  STEVE SCHAFFER,et al.  A Semicoarsening Multigrid Method for Elliptic Partial Differential Equations with Highly Discontinuous and Anisotropic Coefficients , 1998, SIAM J. Sci. Comput..

[24]  Foiles,et al.  Embedded-atom-method functions for the fcc metals Cu, Ag, Au, Ni, Pd, Pt, and their alloys. , 1986, Physical review. B, Condensed matter.

[25]  Jaswinder Pal Singh,et al.  The effects of communication parameters on end performance of shared virtual memory clusters , 1997, SC '97.

[26]  Ron Brightwell,et al.  Instrumenting LogP parameters in GM: implementation and validation , 2002, 27th Annual IEEE Conference on Local Computer Networks, 2002. Proceedings. LCN 2002..

[27]  David A. Bader An Improved Randomized Selection Algorithm With an Experimental Study (Extended Abstract) , 1999 .

[28]  M. Baskes,et al.  Embedded-atom method: Derivation and application to impurities, surfaces, and other defects in metals , 1984 .

[29]  Sanguthevar Rajasekaran,et al.  Derivation of Randomized Sorting and Selection Algorithms , 1993 .

[30]  Richard P. Martin,et al.  Assessing Fast Network Interfaces , 1996, IEEE Micro.

[31]  P. R. Cappello,et al.  Implementing the beam and warming method on the hypercube , 1989, C3P.

[32]  John N. Shadid,et al.  Parallel sparse matrix vector multiply software for matrices with data locality , 1998 .

[33]  Fabrizio Petrini,et al.  Predictive Performance and Scalability Modeling of a Large-Scale Application , 2001, ACM/IEEE SC 2001 Conference (SC'01).

[34]  SkjellumAnthony,et al.  A high-performance, portable implementation of the MPI message passing interface standard , 1996 .