Real PRAM Programming

The SB-PRAM is a parallel architecture which uses i) multithreading in order to hide latency, ii) a pipelined combining butterfly network in order to reduce hot spots and iii) address hashing in order to randomize network traffic and to reduce memory module congestion. Previous work suggests that such a machine will efficiently simulate shared memory with constant access time independent of the number of processors (i.e. the theoretical PRAM model) provided enough threads can be kept busy. A prototype of a 64 processor SB-PRAM has been completed. We report some technical data about this prototype as well as performance measurements. On all benchmark programs measured so far the performance of the real machine was at most 1,37 % slower than predicted by simulations which assume perfect shared memory with uniform access time.

[1]  Burton J. Smith Architecture And Applications Of The HEP Multiprocessor Computer System , 1982, Optics & Photonics.

[2]  Sandeep N. Bhatt,et al.  The fluent abstract machine , 1988 .

[3]  Wolfgang J. Paul,et al.  Realization of PRAMs: Processor Design , 1994, WDAG.

[4]  Donald Yeung,et al.  The MIT Alewife machine: architecture and performance , 1995, ISCA '98.

[5]  Wolfgang J. Paul,et al.  On the Physical Design of PRAMs , 1992, Comput. J..

[6]  Wolfgang J. Paul,et al.  On the cost–effectiveness of PRAMs , 1999, Acta Informatica.

[7]  Christoph W. Kessler,et al.  Practical PRAM programming , 2000, Wiley series on parallel and distributed computing.

[8]  Wolfgang J. Paul,et al.  Performance of MP3D on the SB-PRAM Prototype (Research Note) , 2002, Euro-Par.

[9]  J. Van Leeuwen,et al.  Handbook of theoretical computer science - Part A: Algorithms and complexity; Part B: Formal models and semantics , 1990 .

[10]  Arno Formella,et al.  Scientific Applications on the SB-PRAM , 1997 .

[11]  Rodham E. Tulloss,et al.  The Test Access Port and Boundary Scan Architecture , 1990 .

[12]  Jörg Keller Zur Realisierbarkeit des PRAM Modelles , 1992 .

[13]  Allan Gottlieb,et al.  Operating system data structures for shared memory mimd machines with fetch-and-add , 1988 .

[14]  Kevin P. McAuliffe,et al.  The IBM Research Parallel Processor Prototype (RP3): Introduction and Architecture , 1985, ICPP.

[15]  Anoop Gupta,et al.  The DASH Prototype: Logic Overhead and Performance , 1993, IEEE Trans. Parallel Distributed Syst..

[16]  Ralph Grishman,et al.  The NYU Ultracomputer—Designing an MIMD Shared Memory Parallel Computer , 1983, IEEE Transactions on Computers.

[17]  Arno Formella,et al.  HPP: A High Performance PRAM , 1996 .

[18]  W. F. McColl,et al.  Bulk synchronous parallel computing , 1995 .

[19]  Allan Porterfield,et al.  The Tera computer system , 1990 .

[20]  Leslie G. Valiant,et al.  A bridging model for parallel computation , 1990, CACM.

[21]  Jörg Keller,et al.  Reduction of Network Cost and Wiring in Ranade's Butterfly Routing , 1993, Inf. Process. Lett..

[22]  Leslie G. Valiant,et al.  General Purpose Parallel Architectures , 1991, Handbook of Theoretical Computer Science, Volume A: Algorithms and Complexity.

[23]  Abhiram G. Ranade,et al.  How to emulate shared memory , 1991, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).