Integrated Performance Models for SPMD Applications and MIMD Architectures

Introduces queuing network models for the performance analysis of SPMD (single-program, multiple-data) applications executed on general-purpose parallel architectures such as MIMD (multiple-input, multiple data) and clusters of workstations. The models are based on the pattern of computation, communication and I/O operations of typical parallel applications. Analysis of the models leads to the definition of speedup surfaces which capture the relative influence of processors and I/O parallelism and show the effects of different hardware and software components on the performance. Since the parameters of the models correspond to measurable program and hardware characteristics, the models can be used to anticipate the performance behavior of a parallel application as a function of the target architecture (i.e. the number of processors, number of disks, I/O topology, etc.).

[1]  Michael W. Berry,et al.  Public international benchmarks for parallel computers: PARKBENCH committee: Report-1 , 1994 .

[2]  W. Gropp,et al.  The Scalable I/O Initiative , 1995 .

[3]  Garth A. Gibson,et al.  RAID: high-performance, reliable secondary storage , 1994, CSUR.

[4]  Marianne Winslett,et al.  Application Experience with Parallel Input/Output: Panda and the H3expresso Black Hole Simulation on the SP2 , 1997, PPSC.

[5]  Mark S. Squillante,et al.  The impact of I/O on program behavior and parallel scheduling , 1998, SIGMETRICS '98/PERFORMANCE '98.

[6]  Ken Kennedy,et al.  Performance of parallel processors , 1989, Parallel Comput..

[7]  Jean-Loup Baer,et al.  A performance evaluation of cluster architectures , 1997, SIGMETRICS '97.

[8]  Carlo Ghezzi,et al.  Specification of Realtime Systems Using ASTRAL , 1997, IEEE Trans. Software Eng..

[9]  Ian Foster,et al.  Designing and building parallel programs , 1994 .

[10]  Erol Gelenbe,et al.  Multiprocessor Performance , 1990, SIGMETRICS Perform. Evaluation Rev..

[11]  Edward D. Lazowska,et al.  Speedup Versus Efficiency in Parallel Systems , 1989, IEEE Trans. Computers.

[12]  Paul Messina The concurrent supercomputing consortium: Year 1 , 1993, IEEE Parallel & Distributed Technology: Systems & Applications.

[13]  Marianne Winslett,et al.  Scalable message passing in Panda , 1996, IOPADS '96.

[14]  Bill Nitzberg,et al.  PMPIO-a portable implementation of MPI-IO , 1996, Proceedings of 6th Symposium on the Frontiers of Massively Parallel Computation (Frontiers '96).

[15]  Simonetta Balsamo,et al.  Bound Performance Models of Heterogeneous Parallel Processing Systems , 1998, IEEE Trans. Parallel Distributed Syst..

[16]  Evgenia Smirni,et al.  Lessons from Characterizing the Input/Output Behavior of Parallel Scientific Applications , 1998, Perform. Evaluation.

[17]  Amy W. Apon,et al.  The Circulating Processor Model of Parallel Systems , 1997, IEEE Trans. Computers.

[18]  Edward D. Lazowska,et al.  Quantitative system performance - computer system analysis using queueing network models , 1983, Int. CMG Conference.

[19]  Carla Schlatter Ellis,et al.  File-Access Characteristics of Parallel Scientific Workloads , 1996, IEEE Trans. Parallel Distributed Syst..

[20]  Mark S. Squillante,et al.  Models of Parallel Applications with Large Computation and I/O Requirements , 2002, IEEE Trans. Software Eng..

[21]  W. Daniel Hillis,et al.  The network architecture of the Connection Machine CM-5 (extended abstract) , 1992, SPAA '92.

[22]  Bruno Raffin,et al.  Comparing the communication performance and scalability of a Linux and a NT cluster of PCs, a Cray origin 2000, an IBM SP and a Cray T3E-600 , 1999, ICWC 99. IEEE Computer Society International Workshop on Cluster Computing.

[23]  Ulrich Herzog,et al.  Synchronization Problems in Hierarchically Organized Multiprocessor Computer Systems , 1979, Performance.

[24]  Ian T. Foster,et al.  Designing and building parallel programs - concepts and tools for parallel software engineering , 1995 .

[25]  Elizabeth Varki Mean value technique for closed fork-join networks , 1999, SIGMETRICS '99.

[26]  Terry Williams,et al.  Probability and Statistics with Reliability, Queueing and Computer Science Applications , 1983 .

[27]  Peter J. B. King,et al.  On the Execution of Programs by Many Processors , 1983, Performance.

[28]  Randy H. Katz,et al.  Input/output behavior of supercomputing applications , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).

[29]  R. Bagrodia,et al.  Parallel Simulation of Parallel File Systems and I/O Programs , 1997, ACM/IEEE SC 1997 Conference (SC'97).

[30]  D.A. Reed,et al.  Scalable performance analysis: the Pablo performance analysis environment , 1993, Proceedings of Scalable Parallel Libraries Conference.

[31]  Report,et al.  Public International Benchmarks for Parallel Computers , 1993 .

[32]  G. Amdhal,et al.  Validity of the single processor approach to achieving large scale computing capabilities , 1967, AFIPS '67 (Spring).

[33]  Robert E. Benner,et al.  Development of Parallel Methods for a $1024$-Processor Hypercube , 1988 .

[34]  Basel Mahafzah,et al.  Verification of the Burst Send Queuing System Model for Parallel Programs , 1999, PDPTA.

[35]  John L. Gustafson,et al.  Reevaluating Amdahl's law , 1988, CACM.

[36]  Kishor S. Trivedi Probability and Statistics with Reliability, Queuing, and Computer Science Applications , 1984 .

[37]  P. J. Schweitzer EXACT SOLUTION OF THE MVA EQUATIONS , 1988 .

[38]  Dror G. Feitelson,et al.  Overview of the MPI-IO Parallel I/O Interface , 1996, Input/Output in Parallel and Distributed Computer Systems.

[39]  David Kotz,et al.  Disk-directed I/O for MIMD multiprocessors , 1994, OSDI '94.

[40]  Philippe Nain,et al.  Evaluation of parallel execution of program tree structures , 1984, SIGMETRICS '84.

[41]  Donald F. Towsley,et al.  Computing Performance Bounds of Fork-Join Parallel Programs Under a Multiprocessing Environment , 1998, IEEE Trans. Parallel Distributed Syst..

[42]  Antonio Puliafito,et al.  Design and Performance Analysis of a Disk Array System , 1995, IEEE Trans. Computers.

[43]  John L. Gustafson The Scaled-Sized Model: A Revision of Amdahl’s Law , 2004 .

[44]  Asser N. Tantawi,et al.  An Approximation of the Processing Time for Random Graph Model of Parallel Computation , 1986, FJCC.

[45]  Claudio Gennaro Performance models for I/O bound SPMD applications on clusters of workstations , 1999, Proceedings of the Seventh Euromicro Workshop on Parallel and Distributed Processing. PDP'99.

[46]  P. J. Schweitzer A Nonlinear Vector Finite Difference Scheme (P. J. Schweitzer) , 1981 .

[47]  Wei Li,et al.  Performance models for scalable cluster computing , 1997, J. Syst. Archit..

[48]  François Baccelli,et al.  On the execution of parallel programs on multiprocessor systems—a queuing theory approach , 1990, JACM.

[49]  W. Daniel Hillis,et al.  The Network Architecture of the Connection Machine CM-5 , 1996, J. Parallel Distributed Comput..

[50]  G. C. Polyzos,et al.  A static analysis of I/O characteristics of scientific applications in a production workload , 1993, Supercomputing '93.