Performance of MP3D on the SB-PRAM Prototype (Research Note)

The SB-PRAM is a shared memory machine which hides latency by simple interleaved context switching and which can be expected to behave almost exactly like a PRAM if all threads can be kept busy. We report measured run times of various versions of the MP3D benchmark on the completed hardware of a 64 processor SB-PRAM. The main findings of these experiments are: 1) parallel efficiency is 79% for 32 processors and 56% for 64 processors. 2) Parallel efficiency is limited by the number of available threads.

[1]  Francis J. Aguilar Cray Research, Inc , 2002 .

[2]  Mats Brorsson,et al.  An adaptive cache coherence protocol optimized for migratory sharing , 1993, ISCA '93.

[3]  Wolfgang J. Paul,et al.  On the cost–effectiveness of PRAMs , 1999, Acta Informatica.

[4]  A. Gottleib,et al.  The nyu ultracomputer- designing a mimd shared memory parallel computer , 1983 .

[5]  Sandeep N. Bhatt,et al.  The fluent abstract machine , 1988 .

[6]  Allan Gottlieb,et al.  Operating system data structures for shared memory mimd machines with fetch-and-add , 1988 .

[7]  Anoop Gupta,et al.  SPLASH: Stanford parallel applications for shared-memory , 1992, CARN.

[8]  Donald Yeung,et al.  The MIT Alewife machine: architecture and performance , 1995, ISCA '98.

[9]  Anoop Gupta,et al.  The DASH Prototype: Logic Overhead and Performance , 1993, IEEE Trans. Parallel Distributed Syst..

[10]  Thomas Rauber,et al.  Shared-Memory Implementation of an Irregular Particle Simulation Method , 1996, Euro-Par, Vol. I.

[11]  Wolfgang J. Paul,et al.  Real PRAM Programming , 2002, Euro-Par.

[12]  Wolfgang J. Paul,et al.  On the Physical Design of PRAMs , 1992, Comput. J..

[13]  Ralph Grishman,et al.  The NYU Ultracomputer—Designing an MIMD Shared Memory Parallel Computer , 1983, IEEE Transactions on Computers.

[14]  Arno Formella,et al.  Scientific Applications on the SB-PRAM , 1997 .

[15]  Anoop Gupta,et al.  The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[16]  Allan Porterfield,et al.  The Tera computer system , 1990, ICS '90.

[17]  Robert J. Fowler,et al.  Adaptive cache coherency for detecting migratory shared data , 1993, ISCA '93.

[18]  Kevin P. McAuliffe,et al.  The IBM Research Parallel Processor Prototype (RP3): Introduction and Architecture , 1985, ICPP.