Scans as Primitive Parallel Operations

A study of the effects of adding two scan primitives as unit-time primitives to PRAM (parallel random access machine) models is presented. It is shown that the primitives improve the asymptotic running time of many algorithms by an O(log n) factor, greatly simplifying the description of many algorithms, and are significantly easier to implement than memory references. It is argued that the algorithm designer should feel free to use these operations as if they were as cheap as a memory reference. The author describes five algorithms that clearly illustrate how the scan primitives can be used in algorithm design: a radix-sort algorithm, a quicksort algorithm, a minimum-spanning-tree algorithm, a line-drawing algorithm, and a merging algorithm. These all run on an EREW (exclusive read, exclusive write) PRAM with the addition of two scan primitives and are either simpler or more efficient than their pure PRAM counterparts. The scan primitives have been implemented in microcode on the Connection Machine system, are available in PARIS (the parallel instruction set of the machine). >

[1]  Yuri Petrovich Ofman,et al.  On the Algorithmic Complexity of Discrete Functions , 1962 .

[2]  Kenneth E. Iverson,et al.  A programming language , 1899, AIEE-IRE '62 (Spring).

[3]  Abraham Robinson,et al.  Random-Access Stored-Program Machines, an Approach to Programming Languages , 1964, JACM.

[4]  K. B. Haley,et al.  Programming, Games and Transportation Networks , 1966 .

[5]  Kenneth E. Batcher,et al.  Sorting networks and their applications , 1968, AFIPS Spring Joint Computing Conference.

[6]  Harold S. Stone,et al.  Parallel Processing with the Perfect Shuffle , 1971, IEEE Transactions on Computers.

[7]  Robert F. Sproull,et al.  Principles in interactive computer graphics , 1973 .

[8]  Fanica Gavril Merging with parallel processors , 1975, CACM.

[9]  H. T. Kung,et al.  Sorting on a mesh-connected parallel computer , 1977, CACM.

[10]  Hans J. Berliner,et al.  A Chronology of Computer Chess and its Literature , 1978, Artif. Intell..

[11]  Steven Fortune,et al.  Parallelism in random access machines , 1978, STOC.

[12]  Dilip V. Sarwate,et al.  Computing connected components on parallel computers , 1979, CACM.

[13]  James Christopher Wyllie,et al.  The Complexity of Parallel Computations , 1979 .

[14]  H. T. Kung,et al.  The chip complexity of binary arithmetic , 1980, STOC '80.

[15]  Uzi Vishkin,et al.  Finding the Maximum, Merging, and Sorting in a Parallel Computation Model , 1981, J. Algorithms.

[16]  Joseph JáJá,et al.  Fast, Efficient Parallel Algorithms for Some Graph Problems , 1981, SIAM J. Comput..

[17]  Uzi Vishkin,et al.  An O(n² log n) Parallel MAX-FLOW Algorithm , 1982, J. Algorithms.

[18]  Leslie M. Goldschlager,et al.  A universal interconnection pattern for parallel computers , 1982, JACM.

[19]  Uzi Vishkin,et al.  An O(log n) Parallel Connectivity Algorithm , 1982, J. Algorithms.

[20]  Ralph Grishman,et al.  The NYU Ultracomputer—Designing an MIMD Shared Memory Parallel Computer , 1983, IEEE Transactions on Computers.

[21]  Larry Rudolph,et al.  Basic Techniques for the Efficient Coordination of Very Large Numbers of Cooperating Sequential Processors , 1983, TOPL.

[22]  B. Awerbuch,et al.  New Connectivity and MSF Algorithms for Ultracomputer and PRAM , 1983, ICPP.

[23]  János Komlós,et al.  An 0(n log n) sorting network , 1983, STOC.

[24]  Faith E. Fich,et al.  New Bounds for Parallel Prefix Circuits , 1983, STOC.

[25]  E. Szemerédi,et al.  O(n LOG n) SORTING NETWORK. , 1983 .

[26]  Robert E. Tarjan,et al.  Data structures and network algorithms , 1983, CBMS-NSF regional conference series in applied mathematics.

[27]  Charles L. Seitz,et al.  The cosmic cube , 1985, CACM.

[28]  Frank Thomson Leighton,et al.  Tight Bounds on the Complexity of Parallel Sorting , 1984, IEEE Transactions on Computers.

[29]  Gregory F. Pfister,et al.  “Hot spot” contention and combining in multistage interconnection networks , 1985, IEEE Transactions on Computers.

[30]  Larry Rudolph,et al.  The power of parallel prefix , 1985, IEEE Transactions on Computers.

[31]  W. Daniel Hillis,et al.  The connection machine , 1985 .

[32]  Gary L. Miller,et al.  Parallel tree contraction and its application , 1985, 26th Annual Symposium on Foundations of Computer Science (sfcs 1985).

[33]  Richard Cole,et al.  Parallel merge sort , 1988, 27th Annual Symposium on Foundations of Computer Science (sfcs 1986).

[34]  Richard Cole,et al.  Approximate and exact parallel scheduling with applications to list, tree and graph problems , 1986, 27th Annual Symposium on Foundations of Computer Science (sfcs 1986).

[35]  D. C. Allen The BBN multiprocessors: Butterfly and Monarch , 1986 .

[36]  Albert G. Greenberg,et al.  Simple, Efficient Asynchronous Parallel Prefix Algorithms , 1987, ICPP.

[37]  S. Teng,et al.  Optimal Tree Contraction in the EREW Model , 1988 .

[38]  Guy E. Blelloch,et al.  Scan primitives and parallel vector models , 1989 .

[39]  Richard Cole,et al.  Faster Optimal Parallel Prefix Sums and List Ranking , 2011, Inf. Comput..

[40]  Abhiram G. Ranade,et al.  Fluent parallel computation , 1989 .

[41]  IEEE Transactions on Computers , Computing in Science & Engineering.