A Distributed Hardware Mechanism for Process Synchronization on Shared-Bus Multiprocessors

Several techniques have been used to reduce the performance impact of process synchronization in fine-grained multiprocessor systems. These existing techniques tend to have long synchronization times or high shared-bus use, or they require complex and expensive hardware. A new technique is presented that uses distributed hardware locking queues to reduce both contention and latency to the minimum values that can be obtained using a shared-bus. This technique is shown to require at most two shared-bus transactions, with one transaction being typical. The latency for process continuation after obtaining a lock is reduced to near zero. Barrier synchronization using this distributed mechanism requires only one shared-bus transaction per processor involved in the barrier. This new technique is scalable and applicable to both new architectures and to existing systems, and is less complex than other hardware solutions.

[1]  Philip J. Woest,et al.  The Wisconsin multicube: a new large-scale cache-coherent multiprocessor , 1988, ISCA '88.

[2]  Thomas E. Anderson,et al.  The Performance of Spin Lock Alternatives for Shared-Memory Multiprocessors , 1990, IEEE Trans. Parallel Distributed Syst..

[3]  Nian-Feng Tzeng,et al.  Distributing Hot-Spot Addressing in Large-Scale Multiprocessors , 1987, IEEE Transactions on Computers.

[4]  Anne Dinning,et al.  A survey of synchronization methods for parallel computers , 1989, Computer.

[5]  Henry G. Dietz,et al.  Hardware Barrier Synchronization: Dynamic Barrier MIMD (DBM) , 1990, ICPP.

[6]  Alexander V. Veidenbaum,et al.  The Organization of the Cedar System , 1991, ICPP.

[7]  Henry M. Levy,et al.  The Performance Implications of Thread Management Alternatives for Shared-Memory Multiprocessors , 1989, IEEE Trans. Computers.

[8]  A. Despain,et al.  Multiple-bus shared-memory system: Aquarius project , 1990, Computer.

[9]  Per Stenström,et al.  Reducing Contention in Sharde-Memory Multiprocessors , 1988, Computer.

[10]  Allan Porterfield,et al.  The Tera computer system , 1990 .

[11]  Mary K. Vernon,et al.  Efficient synchronization primitives for large-scale cache-coherent multiprocessors , 1989, ASPLOS 1989.

[12]  Anoop Gupta,et al.  Process control and scheduling issues for multiprogrammed shared-memory multiprocessors , 1989, SOSP '89.

[13]  Henry G. Dietz,et al.  Hardware Barrier Synchronization: Static Barrier MIMD (SBM) , 1990, ICPP.

[14]  Constantine D. Polychronopoulos,et al.  Fast barrier synchronization hardware , 1990, Proceedings SUPERCOMPUTING '90.

[15]  Xiaoming Fan Realization of multiprocessing on a RISC-like architecture , 1992 .

[16]  Janusz S. Kowalik,et al.  Parallel MIMD computation : the HEP supercomputer and its applications , 1985 .

[17]  Anna R. Karlin,et al.  Empirical studies of competitve spinning for a shared-memory multiprocessor , 1991, SOSP '91.

[18]  Edward A. Lee,et al.  A class of multiprocessor architectures for real-time DSP , 1990, IEEE International Symposium on Circuits and Systems.

[19]  Michael Wolfe,et al.  Multiprocessor synchronization for concurrent loops , 1988, IEEE Software.

[20]  David J. Lilja,et al.  Cache coherence in large-scale shared-memory multiprocessors: issues and comparisons , 1993, CSUR.

[21]  Michael L. Scott,et al.  Algorithms for scalable synchronization on shared-memory multiprocessors , 1991, TOCS.

[22]  Pen-Chung Yew,et al.  An effective synchronization network for hot-spot accesses , 1992, TOCS.

[23]  Shreekant S. Thakkar,et al.  Synchronization algorithms for shared-memory multiprocessors , 1990, Computer.

[24]  Thomas E. Anderson,et al.  The Performance Implications of Spin-Waiting Alternatives for Shared-Memory Multiprocessors , 1989, ICPP.

[25]  Thomas E. Anderson,et al.  The performance implications of thread management alternatives for shared-memory multiprocessors , 1989, SIGMETRICS '89.

[26]  Allan Gottlieb,et al.  Process coordination with fetch-and-increment , 1991, ASPLOS IV.

[27]  Michel Dubois,et al.  Scalable shared-memory multiprocessor architectures , 1990, Computer.

[28]  Arvind,et al.  Two Fundamental Issues in Multiprocessing , 1987, Parallel Computing in Science and Engineering.

[29]  David J. Lilja Exploiting the parallelism available in loops , 1994, Computer.