Performance prediction of parallel processing systems: the PAMELA methodology

In this paper we present a new methodology for the performance prediction of parallel programs on parallel platforms ranging from shared-memory to distributed-memory (vector) machines. The methodology comprises a procedural program and machine specification paradigm based on PAMELA (PerformAnce ModEling LAnguage), along with a performance calculus, called “serialization analysis”. This calculus extends conventional parallel program analysis technology by explicitly accounting for resource contention, yet at the low evaluation cost typical for static techniques. It is shown that, where conventional techniques introduce fundamental errors, predictions from serialization analysis remain realistic. Apart from the merits of the methodology itself, this high reliability/cost ratio makes PAMELA an attractive candidate for compile-time application within the performance prediction hierarchy often found in parallel programming environments.

[1]  Francine Berman,et al.  Predicting the performance of large programs on scalable multicomputers , 1992, Proceedings Scalable High Performance Computing Conference SHPCC-92..

[2]  Herb Schwetman,et al.  CSIM: a C-based process-oriented simulation language , 1986, WSC '86.

[3]  Constantine D. Polychronopoulos,et al.  Speedup Bounds and Processor Allocation for Parallel Programs on Multiprocessors , 1986, ICPP.

[4]  C. A. R. Hoare,et al.  Communicating Sequential Processes (Reprint) , 1983, Commun. ACM.

[5]  Gregory R. Andrews,et al.  Concepts and Notations for Concurrent Programming , 1983, CSUR.

[6]  Allen D. Malony,et al.  Performance prediction of loop constructs on multiprocessor hierarchical-memory systems , 1989, ICS '89.

[7]  Bruce P. Lester A System for Computing the Speedup of Parallel Programs , 1986, ICPP.

[8]  Charles Koelbel,et al.  Compiling Global Name-Space Parallel Loops for Distributed Execution , 1991, IEEE Trans. Parallel Distributed Syst..

[9]  Hermann Mierendorff,et al.  Performance estimations for SUPRENUM systems , 1988, Parallel Comput..

[10]  James Smith,et al.  A Simulation Study of the CRAY X-MP Memory System , 1986, IEEE Transactions on Computers.

[11]  Allen D. Malony,et al.  Vector Processing on the Alliant FX/8 Multiprocessor , 1986, ICPP.

[12]  Alan H. Karp,et al.  A comparison of 12 parallel FORTRAN dialects , 1988, IEEE Software.

[13]  Ken Kennedy,et al.  A static performance estimator to guide data partitioning decisions , 1991, PPOPP '91.

[14]  Henk Jonkers,et al.  Introduction to Probabilistic Performance Modelling of Parallel Applications , 1993 .

[15]  M. L. Simmons,et al.  A close look at vector performance of register-to-register vector computers and a new model , 1987, SIGMETRICS '87.

[16]  W. Ewinger,et al.  Modelling and measurement of memory access in SIEMENS VP supercomputers , 1989, Parallel Comput..

[17]  Mamoru Maekawa,et al.  Operating Systems: Advanced Concepts , 1987 .

[18]  Ko-Yang Wang Intelligent program optimization and parallelization for parallel computers , 1991 .

[19]  Wilfried Oed,et al.  Modelling, measurement, and simulation of memory interference in the CRAY X-MP , 1986, Parallel Comput..

[20]  Reda A. Ammar,et al.  Micro Time Cost Analysis of Parallel Computations , 1991, IEEE Trans. Computers.

[21]  Frederica Darema,et al.  A Speedup Analyzer for Parallel Programs , 1987, ICPP.

[22]  Milind Girkar,et al.  Parafrase-2: an Environment for Parallelizing, Partitioning, Synchronizing, and Scheduling Programs on Multiprocessors , 1989, Int. J. High Speed Comput..

[23]  Arjan J. C. Van Gemund Performance Modeling with PAMELA: An Introduction , 1992 .

[24]  Arjan J. C. van Gemund,et al.  A method for parallel program generation with an application to the Booster language , 1990, ICS '90.

[25]  Dirk Roose,et al.  Benchmarking the iPSC/2 Hypercube Multiprocessor , 1989, Concurr. Pract. Exp..

[26]  Stephen F. Lundstrom,et al.  Predicting Performance of Parallel Computations , 1990, IEEE Trans. Parallel Distributed Syst..

[27]  Ken Kennedy,et al.  Parascope:a Parallel Programming Environment , 1988 .

[28]  Daniel P. Siewiorek,et al.  Performance Prediction and Calibration for a Class of Multiprocessors , 1988, IEEE Trans. Computers.

[29]  M. Annaratone,et al.  Interprocessor communication speed and performance in distributed-memory parallel processors , 1989, ISCA '89.

[30]  Narain H. Gehani,et al.  Concurrent C , 1986, Softw. Pract. Exp..

[31]  Hermann Mierendorff,et al.  LAPAS: A Performance Evaluation Tool for Large Parallel Systems , 1990, ARCS.

[32]  Roger W. Hockney,et al.  F1/2: a Parameter to Characterize Memory and Communication Bottlenecks , 1989, Parallel Comput..

[33]  Vivek Sarkar,et al.  Partitioning and Scheduling Parallel Programs for Multiprocessing , 1989 .

[34]  Alan H. Karp,et al.  Programming for Parallelism , 1987, Computer.

[35]  Wilson C. Hsieh,et al.  A framework for determining useful parallelism , 1988, ICS '88.

[36]  Allen D. Malony,et al.  Faust: an integrated environment for parallel programming , 1989, IEEE Software.

[37]  KremerUlrich,et al.  A static performance estimator to guide data partitioning decisions , 1991 .

[38]  Frederica Darema,et al.  A single-program-multiple-data computational model for EPEX/FORTRAN , 1988, Parallel Comput..

[39]  Wolfgang Kreutzer,et al.  System simulation programming styles and languages , 1986 .

[40]  David A. Fisher,et al.  Parallel Processing in Ada , 1986, Computer.