The continuing rapid progress of VLSI technology is beginning to make possible the construction of very-large-scale parallel computing assemblages. In such systems , tens or even hundreds of thousands of arithmetic devices cooperate to solve certain problems quickly. Parallel computers of this type have been contemplated for many years, 1-3 and fairly large parallel machines such as Illiac IV and ICL DAP have been made operational. Recently , however, technological progress has lent new interest to this area. It has begun to attract the attention of increasing numbers of university and industrial researchers , who have chosen various lines of attack. One significant approach pioneered by Kung focuses on the great economic and speed advantage that can be gained by designing algorithms that conform well to the restrictions imposed by VLSI technology,"7 in particular to algorithms and parallel system architectures that lay out well in two dimensions. Studies along these lines aim at the design of powerful special-purpose chips and of systems small enough to reside on a single chip. A second, more conventional approach is represented by the work reported in this article. This approach entails the use of high-performance but otherwise standard microprocessor chips tightly coupled via a suitable network. Central assumptions of this approach are that single-chip processors will be able to execute instructions at a 20-megacycle rate and that megabit memory chips will be available in quantity by the end ofthe present decade. The possibility of using modified versions of presently existing programming languages to program large parallel machines is an important feature of this work. A third line of research emphasizes architectures derived from very general abstract data flow models of parallel computation.8'9 This work has stressed the possible advantages of a purely applicative, side-effect-free programming language for the description of parallel computation. 10 These three approaches lead to machines suited for different environments. Kung's systolic arrays should be most useful for such well defined, fixed tasks as the kernel of certain signal processing applications. These arrays might be hard to adapt when the algorithms change or when many different cases must be considered. Although data flow machines have been discussed for several years, no optimal architecture has yet emerged. Later in this article, we show how a data flow language can be executed with maximum parallelism on the more conventional parallel machines described here. A crucial part of the design of any highly parallel …
[1]
Charles Clos,et al.
A study of non-blocking switching networks
,
1953
.
[2]
J. Schwartz.
Large Parallel Computers
,
1966,
JACM.
[3]
V. Benes,et al.
Mathematical Theory of Connecting Networks and Telephone Traffic.
,
1966
.
[4]
Marshall C. Pease.
Matrix Inversion Using Parallel Processing
,
1967,
JACM.
[5]
Richard M. Brown,et al.
The ILLIAC IV Computer
,
1968,
IEEE Transactions on Computers.
[6]
Harold S. Stone,et al.
Parallel Processing with the Perfect Shuffle
,
1971,
IEEE Transactions on Computers.
[7]
Jack B. Dennis,et al.
A preliminary architecture for a basic data-flow processor
,
1974,
ISCA '75.
[8]
Duncan H. Lawrie,et al.
Access and Alignment of Data in an Array Processor
,
1975,
IEEE Transactions on Computers.
[9]
A large scale, homogeneous, fully distributed parallel machine, I
,
1977,
ISCA '77.
[10]
Wil Plouffe,et al.
An asynchronous programming language and computing machine
,
1978
.
[11]
H. T. Kung.
Let's Design Algorithms for VLSI Systems
,
1979
.
[12]
Franco P. Preparata,et al.
The cube-connected-cycles: A versatile network for parallel computation
,
1979,
20th Annual Symposium on Foundations of Computer Science (sfcs 1979).
[13]
H. T. Kung.
Special-purpose devices for signal and image processing : an opportunity in VLSI
,
1980
.
[14]
Larry Rudolph,et al.
Basic Techniques for the Efficient Coordination of Very Large Numbers of Cooperating Sequential Processors
,
1983,
TOPL.