Toward a dataflow/von Neumann hybrid architecture

Dataflow architectures offer the ability to trade program level parallelism in order to overcome machine level latency. Dataflow further offers a uniform synchronization paradigm, representing one end of a spectrum wherein the unit of scheduling is a single instruction. At the opposite extreme are the von Neumann architectures which schedule on a task, or process, basis. This paper examines the spectrum by proposing a new architecture which is a hybrid of dataflow and von Neumann organizations. The analysis attempts to discover those features of the dataflow architecture, lacking in a von Neumann machine, which are essential for tolerating latency and synchronization costs. These features are captured in the concept of a parallel machine language which can be grafted on top of an otherwise traditional von Neumann base. In such an architecture, the units of scheduling, called scheduling quanta, are bound at compile time rather than at instruction set design time. The parallel machine language supports this notion via a large synchronization name space. A prototypical architecture is described, and results of simulation studies are presented. A comparison is made between the MIT Tagged-Token Dataflow machine and the subject machine which presents a model for understanding the cost of synchronization in a parallel environment.

[1]  David E. Culler,et al.  The price of parallelism , 1987 .

[2]  Vivek Sarkar,et al.  Partitioning parallel programs for macro-dataflow , 1986, LFP '86.

[3]  Gregory M. Papadopoulos,et al.  Implementation of a general purpose dataflow multiprocessor , 1991 .

[4]  Konrad Lai,et al.  Interprocess communication and processor dispatching on the Intel 432 , 1983, TOCS.

[5]  D. E. Culler,et al.  RESOURCE MANAGEMENT FOR THE TAGGED TOKEN DATAFLOW ARCHITECTURE , 1985 .

[6]  K. R. Traub,et al.  A COMPILER FOR THE MIT TAGGED-TOKEN DATAFLOW ARCHITECTURE , 1986 .

[7]  Kattamuri Ekanadham,et al.  Incorporating Data Flow Ideas into von Neumann Processors for Parallel Execution , 1987, IEEE Transactions on Computers.

[8]  Richard M. Russell,et al.  The CRAY-1 computer system , 1978, CACM.

[9]  B J Smith,et al.  A pipelined, shared resource MIMD computer , 1986 .

[10]  Ahmed Sameh,et al.  The Illiac IV system , 1972 .

[11]  Jarek Deminet Experience with Multiprocessor Algorithms , 1982, IEEE Transactions on Computers.

[12]  Arvind,et al.  A critique of multiprocessing von Neumann style , 1983, ISCA '83.

[13]  Harry F. Jordan Performance measurements on HEP - a pipelined MIMD computer , 1983, ISCA '83.

[14]  D J Kuck,et al.  Parallel Supercomputing Today and the Cedar Approach , 1986, Science.

[15]  Robert H. Halstead,et al.  MULTILISP: a language for concurrent symbolic computation , 1985, TOPL.

[16]  Arvind,et al.  Two Fundamental Issues in Multiprocessing , 1987, Parallel Computing in Science and Engineering.

[17]  David E. Culler,et al.  Resource requirements of dataflow programs , 1988, [1988] The 15th Annual International Symposium on Computer Architecture. Conference Proceedings.

[18]  Arvind,et al.  Design of a memory controller for the MIT tagged token dataflow machine , 1983 .

[19]  K. R. Traub,et al.  Sequential implementation of lenient programming languages , 1988 .

[20]  Alice M. Chiang VLSI processor architectures for computer vision , 1989 .

[21]  R. M. Tomasulo,et al.  An efficient algorithm for exploiting multiple arithmetic units , 1995 .

[22]  George Radin,et al.  The 801 minicomputer , 1982, ASPLOS I.

[23]  David A. Patterson,et al.  Reduced instruction set computers , 1985, CACM.

[24]  Lubomir F. Bic A Process-Oriented Model for Efficient Execution of Dataflow Programs , 1990, J. Parallel Distributed Comput..

[25]  B. Ramakrishna Rau,et al.  Architectural support for the efficient generation of code for horizontal architectures , 1982, ASPLOS I.

[26]  David W. Anderson,et al.  The IBM System/360 model 91: machine philosophy and instruction-handling , 1967 .