Scalable Shared Memory Multiprocessors

Philip Bitar Aquarius Project Computer Science Division University of California Berkeley, CA 94720 bitar@berkeley.edu We develop the synchronization topic of MIMD combining trees their motivation, their structure, their parameters and we illustrate these principles using fetchand-add. We define the concept of combining window. an interval of time during which a request is held in a combining node in order to allow it to combine with subsequent incoming requests. We show that the combining window is necessary in order to realize the dual fonns of concurrency execution and storage concurrency that a combining tree is designed to achieve. Execution concurrency among the nodes of a combining tree enables the tree to achieve the speed up that it is designed to give. Without sufficient execution concurrency, the tree will not achieve the desired speed up. Storage concurrency among the nodes of a combining tree enables the tree to achieve the buffer storage that is necessary in order to implement the combining of requests. Without sufficient storage concurrency, node buffers will overflow. More specifically, the combining window shows how to bound node buffer size.

[1]  Makoto Kobayashi,et al.  The Stack Growth Function: Cache Line Reference Models , 1989, IEEE Trans. Computers.

[2]  A.R. Newton,et al.  An empirical evaluation of two memory-efficient directory methods , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.

[3]  Trevor Mudge,et al.  Performance of Parallel Loops using Alternative Cache Consistency Protocols on a Non-Bus Multiprocessor , 1990 .

[4]  Robert H. Halstead,et al.  Mul-T: a high-performance parallel Lisp , 1989, PLDI '89.

[5]  Anoop Gupta,et al.  The directory-based cache coherence protocol for the DASH multiprocessor , 1990, ISCA '90.

[6]  James K. Archibald,et al.  Cache coherence protocols: evaluation using a multiprocessor simulation model , 1986, TOCS.

[7]  David E. Culler,et al.  Monsoon: an explicit token-store architecture , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.

[8]  Anant Agarwal,et al.  Scalability of parallel machines , 1991, CACM.

[9]  Robert H. Halstead,et al.  Lazy task creation: a technique for increasing the granularity of parallel programs , 1990, LISP and Functional Programming.

[10]  Anoop Gupta,et al.  Analysis of cache invalidation patterns in multiprocessors , 1989, ASPLOS III.

[11]  Leonard Kleinrock,et al.  Virtual Cut-Through: A New Computer Communication Switching Technique , 1979, Comput. Networks.

[12]  Janak H. Patel,et al.  Accurate Low-Cost Methods for Performance Evaluation of Cache Memory Systems , 1988, IEEE Trans. Computers.

[13]  Gorur Narayana Srinivasa Prasanna,et al.  Structure driven multiprocessor compilation of numeric problems , 1991 .

[14]  Lawrence C. Stewart,et al.  Firefly: a multiprocessor workstation , 1987, IEEE Trans. Computers.

[15]  James K. Archibald,et al.  An economical solution to the cache coherence problem , 1984, ISCA '84.

[16]  A. Gupta,et al.  Exploring the benefits of multiple hardware contexts in a multiprocessor architecture: preliminary results , 1989, ISCA '89.

[17]  Charles L. Seitz,et al.  Concurrent VLSI Architectures , 1984, IEEE Transactions on Computers.

[18]  Shreekant S. Thakkar,et al.  Performance of an OLTP application on symmetry multiprocessor system , 1990, ISCA '90.

[19]  Burton J. Smith,et al.  The Horizon supercomputing system: architecture and software , 1988, Proceedings. SUPERCOMPUTING '88.

[20]  James R. Larus,et al.  Abstract execution: A technique for efficiently tracing programs , 1990, Softw. Pract. Exp..

[21]  Burton J. Smith Architecture And Applications Of The HEP Multiprocessor Computer System , 1982, Optics & Photonics.

[22]  Stein Gjessing,et al.  Distributed-directory scheme: scalable coherent interface , 1990, Computer.

[23]  W. Daniel Hillis,et al.  The connection machine , 1985 .

[24]  Kevin P. McAuliffe,et al.  The IBM Research Parallel Processor Prototype (RP3): Introduction and Architecture , 1985, ICPP.

[25]  A. Dain Samples,et al.  Mache: no-loss trace compaction , 1989, SIGMETRICS '89.