An Optimal Implementation of Broadcasting with Selective Reduction

A model of parallel computation called broadcasting with selective reduction (BSR) can be viewed as a concurrent-read concurrent-write (CRCW) parallel random access machine (PRAM) with one extension. An additional type of concurrent memory access is permitted in BSR, namely the BROADCAST instruction by means of which all N processors may gain access to all M memory locations simultaneously for the purpose of writing. At each memory location, a subset of the incoming broadcast data is selected and reduced to one value finally stored in that location. For several problems, BSR algorithms are known which require fewer steps than the corresponding best-known PRAM algorithms, using the same number of processors. A circuit is introduced to implement the BSR model, and it is shown that, in size and depth, the circuit presented is of the same order as an optimal circuit implementing the PRAM. Thus, if it is reasonable to assume that CRCW PRAM instructions execute in constant time, the assumption of a constant time BROADCAST instruction is no less reasonable. >

[1]  Uzi Vishkin,et al.  A Parallel-Design Distributed-Implementation (PDDI) General-Purpose Computer , 2011, Theor. Comput. Sci..

[2]  Ludek Kucera,et al.  Parallel Computation and Conflicts in Memory Access , 1982, Information Processing Letters.

[3]  Kurt Mehlhorn,et al.  Deterministic Simulation of Idealized Parallel Computers on More Realistic Ones , 1987, SIAM J. Comput..

[4]  Jeffrey D Ullma Computational Aspects of VLSI , 1984 .

[5]  V. P. Kumar,et al.  Fault-Tolerant Multistage Interconnection Networks for Multiprocessor Systems , 1988 .

[6]  Guy E. Blelloch,et al.  Vector Models for Data-Parallel Computing , 1990 .

[7]  Lawrence Snyder,et al.  Type architectures, shared memory, and the corollary of modest potential , 1986 .

[8]  Stephen A. Cook,et al.  Bounds on the time for parallel RAM's to compute simple functions , 1982, STOC '82.

[9]  Claude E. Shannon,et al.  Memory requirements in a telephone exchange , 1950 .

[10]  Selim G. Akl,et al.  Application of Broadcasting with Selective Reduction to the Maximal Sum Subsegment Problem , 1991, Int. J. High Speed Comput..

[11]  János Komlós,et al.  An 0(n log n) sorting network , 1983, STOC.

[12]  Kurt Mehlhorn,et al.  Deterministic Simulation of Idealized Parallel Computers on More Realistic Ones , 1986, MFCS.

[13]  Selim G. Akl,et al.  Broadcasting with Selective Reduction , 1989, IFIP Congress.

[14]  Ian Parberry,et al.  Parallel complexity theory , 1987, Research notes in theoretical computer science.

[15]  Richard M. Karp,et al.  Parallel Algorithms for Shared-Memory Machines , 1991, Handbook of Theoretical Computer Science, Volume A: Algorithms and Complexity.

[16]  Abhiram G. Ranade,et al.  How to emulate shared memory , 1991, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).

[17]  Guy E. Blelloch,et al.  Scans as Primitive Parallel Operations , 1989, ICPP.

[18]  Sartaj Sahni,et al.  Optimal BPC Permutations on a Cube Connected SIMD Computer , 1982, IEEE Transactions on Computers.

[19]  Marc Snir,et al.  On Parallel Searching , 2011, SIAM J. Comput..

[20]  Dharma P. Agrawal,et al.  A Survey and Comparision of Fault-Tolerant Multistage Interconnection Networks , 1987, Computer.

[21]  Sartaj Sahni,et al.  Data broadcasting in SIMD computers , 1981, IEEE Transactions on Computers.

[22]  Kenneth E. Batcher,et al.  Sorting networks and their applications , 1968, AFIPS Spring Joint Computing Conference.

[23]  Steven Fortune,et al.  Parallelism in random access machines , 1978, STOC.