Performance Modeling with PAMELA: An Introduction

In this report we present a new methodology for the performance prediction of parallel programs on parallel platforms ranging from shared-memory to distributed-memory (vector) machines. The complete methodology comprises the concurrent language Pamela (PerformAnce ModEling LAnguage), the program and machine modeling paradigm, and a novel performance analysis method, called "serialization analysis". While Pamela models can be directly executed (i.e., simulated), prior to this ultimate evaluation step, serialization analysis allows for (symbolic) model reduction, which often renders simulation super uous. This analysis method extends conventional parallel program analysis technology by explicitly accounting for the performance degrading e ects of resource contention, yet at the low evaluation cost, typical for conventional techniques. It is shown that, where application of conventional techniques may yield serious errors, predictions from serialization analysis remain accurate. Apart from the modeling methodology itself, this low-cost/high-reliability analysis potential makes Pamela a particularly suitable candidate for compile-time application in terms of the performance prediction hierarchy often found in parallel programming environments. 1

[1]  Bruce P. Lester A System for Computing the Speedup of Parallel Programs , 1986, ICPP.

[2]  Wilfried Oed,et al.  Modelling, measurement, and simulation of memory interference in the CRAY X-MP , 1986, Parallel Comput..

[3]  James Smith,et al.  A Simulation Study of the CRAY X-MP Memory System , 1986, IEEE Transactions on Computers.

[4]  Dileep Bhandarkar Some Performance Issues in Multiprocessor System Design , 1977, IEEE Transactions on Computers.

[5]  C. A. R. Hoare,et al.  Communicating sequential processes , 1978, CACM.

[6]  Ken Kennedy,et al.  Parascope:a Parallel Programming Environment , 1988 .

[7]  Herb Schwetman,et al.  CSIM: a C-based process-oriented simulation language , 1986, WSC '86.

[8]  Constantine D. Polychronopoulos,et al.  Speedup Bounds and Processor Allocation for Parallel Programs on Multiprocessors , 1986, ICPP.

[9]  Allen D. Malony,et al.  Vector Processing on the Alliant FX/8 Multiprocessor , 1986, ICPP.

[10]  Alan H. Karp,et al.  A comparison of 12 parallel FORTRAN dialects , 1988, IEEE Software.

[11]  Wilson C. Hsieh,et al.  A framework for determining useful parallelism , 1988, ICS '88.

[12]  David H. Bailey,et al.  Vector Computer Memory Bank Contention , 1987, IEEE Transactions on Computers.

[13]  Milind Girkar,et al.  Parafrase-2: an Environment for Parallelizing, Partitioning, Synchronizing, and Scheduling Programs on Multiprocessors , 1989, Int. J. High Speed Comput..

[14]  Wolfgang Kreutzer,et al.  System simulation programming styles and languages , 1986 .

[15]  Hermann Mierendorff,et al.  Performance estimations for SUPRENUM systems , 1988, Parallel Comput..

[16]  Charles Koelbel,et al.  Compiling Global Name-Space Parallel Loops for Distributed Execution , 1991, IEEE Trans. Parallel Distributed Syst..

[17]  Ron Cytron,et al.  Doacross: Beyond Vectorization for Multiprocessors , 1986, ICPP.

[18]  Marco Ajmone Marsan,et al.  Performance models of multiprocessor systems , 1987, MIT Press series in computer systems.

[19]  Frederica Darema,et al.  A single-program-multiple-data computational model for EPEX/FORTRAN , 1988, Parallel Comput..

[20]  M. L. Simmons,et al.  A close look at vector performance of register-to-register vector computers and a new model , 1987, SIGMETRICS '87.

[21]  Gregory R. Andrews,et al.  Concepts and Notations for Concurrent Programming , 1983, CSUR.

[22]  Allen D. Malony,et al.  Performance prediction of loop constructs on multiprocessor hierarchical-memory systems , 1989, ICS '89.

[23]  Chris R. Jesshope,et al.  Parallel Computers 2: Architecture, Programming and Algorithms , 1981 .

[24]  Allen D. Malony,et al.  Faust: an integrated environment for parallel programming , 1989, IEEE Software.

[25]  Rajiv Gupta Synchronization and Communication Costs of Loop Partitioning on Shared-Memory Multiprocessor Systems , 1992, IEEE Trans. Parallel Distributed Syst..

[26]  Dennis Gannon,et al.  The characteristics of parallel algorithms , 1987 .

[27]  Arjan J. C. van Gemund,et al.  A method for parallel program generation with an application to the Booster language , 1990, ICS '90.

[28]  Mamoru Maekawa,et al.  Operating Systems: Advanced Concepts , 1987 .

[29]  KremerUlrich,et al.  A static performance estimator to guide data partitioning decisions , 1991 .

[30]  Ko-Yang Wang Intelligent program optimization and parallelization for parallel computers , 1991 .

[31]  Peter Radford,et al.  Petri Net Theory and the Modeling of Systems , 1982 .

[32]  David T. Harper,et al.  Vector Access Performance in Parallel Memories Using a Skewed Storage Scheme , 1987, IEEE Transactions on Computers.

[33]  K. Mani Chandy,et al.  Open, Closed, and Mixed Networks of Queues with Different Classes of Customers , 1975, JACM.

[34]  Dirk Roose,et al.  Benchmarking the iPSC/2 Hypercube Multiprocessor , 1989, Concurr. Pract. Exp..

[35]  Stephen F. Lundstrom,et al.  Predicting Performance of Parallel Computations , 1990, IEEE Trans. Parallel Distributed Syst..

[36]  C. V. Ramamoorthy,et al.  Performance Evaluation of Asynchronous Concurrent Systems Using Petri Nets , 1980, IEEE Transactions on Software Engineering.

[37]  Reda A. Ammar,et al.  Micro Time Cost Analysis of Parallel Computations , 1991, IEEE Trans. Computers.

[38]  M. Annaratone,et al.  Interprocessor communication speed and performance in distributed-memory parallel processors , 1989, ISCA '89.

[39]  Hermann Mierendorff,et al.  LAPAS: A Performance Evaluation Tool for Large Parallel Systems , 1990, ARCS.

[40]  Roger W. Hockney,et al.  F1/2: a Parameter to Characterize Memory and Communication Bottlenecks , 1989, Parallel Comput..

[41]  Vivek Sarkar,et al.  Partitioning and Scheduling Parallel Programs for Multiprocessing , 1989 .

[42]  Eugene Miya,et al.  Machine Characterization Based on an Abstract High-level Language Machine , 1990, PERV.

[43]  W. Ewinger,et al.  Modelling and measurement of memory access in SIEMENS VP supercomputers , 1989, Parallel Comput..

[44]  Frederica Darema,et al.  A Speedup Analyzer for Parallel Programs , 1987, ICPP.

[45]  Francine Berman,et al.  Predicting the performance of large programs on scalable multicomputers , 1992, Proceedings Scalable High Performance Computing Conference SHPCC-92..

[46]  Alan H. Karp,et al.  Programming for Parallelism , 1987, Computer.

[47]  Daniel P. Siewiorek,et al.  Performance Prediction and Calibration for a Class of Multiprocessors , 1988, IEEE Trans. Computers.