A prototype data flow computer with token labelling

Computer users in many areas of scientific study appear to have an almost insatiable desire for processing power. Many of the computations exhibit a high degree of parallelism which is not exploited in conventional computer structures. Considerable interest has been shown in parallel computer architectures in the last decade in order to make use of this property. The two main approaches toward this goal have become known as SIMD (array processor) and MIMD (multiprocessor) architectures which are typified by the ILLIAC IV and the C.mmp systems respectively. It is not appropriate to describe these architectures in detail here, but it should be noted that a number of inadequacies have come to light as these systems have been applied to many problems. There are two basic difficulties which appear in both architectures in slightly different guises. The first concerns the arbitrary choice of a fixed number of processing units in an architecture (64 in the case of ILL lAC IV, 16 in the case ofC.mmp). The difficulty that is occasioned by this choice derives from the fact that it is often inconvenient, if not impossible, to organize the problem to be solved so that it is exactly the "width" of the architecture. The second problem arises when sections of essentially serial programs are processed. The problems in SIMD architecture are clear; in MIMD systems the difficulty usually appears when a critical area of memory becomes "locked-out" while many processors are trying to access it. Evidence is accumulating that relatively little of the potential speedup of SIMD and MIMD systems can actually be achieved for a broad spectrum of problems: Minsky's conjecture, 1 Flynn's analysis 2 and Enslow's view of operating system overheads in multiprocessors all attest to this opinion. More recently, interest has arisen in data driven computation for the expression of programs (known as Data Flow). A computation is expressed as a data dependent directed graph with nodes representing computational operations and the arcs representing the flow of data. The nodes become ready for execution when all their input data are available. The expression of parallelism in terms of data dependencies rather than in spite of them, leads to a far more natural and flexible picture of parallel program execution.