The synchronization barrier is a point in the program where the processing elements (PEs) wait until all the PEs have arrived at this point. In a reduction computation, given a commutative and associative binary operationop, one needs to reduce valuesa0,...,aN-1, stored in PEs 0,...,N-1 to a single valuea*=a0op a, op...op aN-1 and then to broadcast the resulta* to all PEs. This computation is often followed by a synchronization barrier. Routines to perform these functions are frequently required in parallel programs. Simple and efficient, workingC-language routines for the parallel barrier synchronization and reduction computations are presented. The codes are appropriate for a CREW (concurrent-read-exclusive-write) or EREW parallel random access shared memory MIMD computer. They require only shared memory read and write; no locks, semaphores etc. are needed. The running time of each of these routines isO(logN). The amount of shared memory required and the number of shared memory accesses generated are botO(N). These are the asymptotically minimum values for the three parameters. The algorithms employ the obvious computational scheme involving a binary tree. Examples of applications for these routines and results of performance testing on the Sequent Balance 21000 computer are presented.
[1]
Boris D. Lubachevsky,et al.
An approach to automating the verification of compact parallel coordination programs. I
,
2018,
Acta Informatica.
[2]
Larry Rudolph,et al.
Basic Techniques for the Efficient Coordination of Very Large Numbers of Cooperating Sequential Processors
,
1983,
TOPL.
[3]
Debra Hensgen,et al.
Two algorithms for barrier synchronization
,
1988,
International Journal of Parallel Programming.
[4]
Anita Osterhaug.
Guide to parallel programming on Sequent computer systems
,
1989
.
[5]
Boris D. Lubachevsky,et al.
Efficient distributed event-driven simulations of multiple-loop networks
,
1988,
CACM.
[6]
Eugene D. Brooks,et al.
The butterfly barrier
,
1986,
International Journal of Parallel Programming.
[7]
Albert G. Greenberg,et al.
Simple, efficient, asynchronous parallel algorithms for maximization
,
1988,
TOPL.
[8]
Lawrence S. Rudolph,et al.
Software Structures for Ultraparallel Computing
,
1982
.
[9]
Nian-Feng Tzeng,et al.
Distributing Hot-Spot Addressing in Large-Scale Multiprocessors
,
1987,
IEEE Transactions on Computers.