A barrier are a commonly used mechanism for synchronizing processors executing in parallel. A software implementation of the barrier mechanism using shared variables has two major drawbacks. First, the synchronization overhead is high and second, when a processor reaches the barrier it must idle until all processors reach the barrier. In this paper, the fuzzy barrier, a mechanism that avoids the above drawbacks, is presented. The first problem is avoided by implementing the mechanism in hardware. The second problem is solved by using software techniques to find useful instructions that can be executed by a processor while it awaits synchronization. The hardware implementation eliminates busy waiting at barriers, provides a mask that allows disjoint subsets of processors to synchronize simultaneously, and provides multiple barriers by associating a tag with a barrier. Compiler techniques are presented for constructing barrier regions which consist of instructions that a processor can execute while it is waiting for other processors to reach the barrier. The larger the barrier region, the more likely it is that none of the processors will have to stall. Initial observations show that barrier regions can be large and the use of program transformations can significantly increase their size.
[1]
Nian-Feng Tzeng,et al.
Distributing Hot-Spot Addressing in Large-Scale Multiprocessors
,
1987,
IEEE Transactions on Computers.
[2]
Rajiv Gupta,et al.
A Reconfigurable LIW Architecture
,
1987,
International Conference on Parallel Processing.
[3]
Anita Osterhaug.
Guide to parallel programming on Sequent computer systems
,
1989
.
[4]
Constantine D. Polychronopoulos.
Compiler Optimizations for Enhancing Parallelism and Their Impact on Architecture Design
,
1988,
IEEE Trans. Computers.
[5]
Ron Cytron,et al.
Doacross: Beyond Vectorization for Multiprocessors
,
1986,
ICPP.
[6]
David A. Padua,et al.
Dependence graphs and compiler optimizations
,
1981,
POPL '81.
[7]
John R. Ellis,et al.
Bulldog: A Compiler for VLIW Architectures
,
1986
.
[8]
Thomas R. Gross,et al.
Postpass Code Optimization of Pipeline Constraints
,
1983,
TOPL.
[9]
Wei-Chung Hsu.
Register allocation and code scheduling for load/store architectures
,
1987
.
[10]
Rajiv Gupta.
The fuzzy barrier: a mechanism for high speed synchronization of processors
,
1989,
ASPLOS 1989.