Multithreading: a revisionist view of dataflow architectures

Although they are powerful intermediate representations for compilers, pure dataflow graphs are incomplete, and perhaps even undesirable, machine languages. They are incomplete because it is hard to encode critical sections and imperative operations which are essential for the efficient execution of operating system functions, such as resource management. They may be undesirable because they imply a uniform dynamic scheduling policy for all instructions, preventing a compiler from expressing a static schedule which could result in greater run time efficiency, both by reducing redundant operand synchronization, and by using high speed registers to communicate state between instructions. In this paper, we develop a new machine-level programming model which builds upon two previous improvements to the dataflow execution model: sequential scheduling of instructions, and multiported registers for expression temporaries. Surprisingly, these improvements have required almost no architectural changes to explicit token store (ETS) dataflow hardware, only a shift in mindset when reasoning about how that hardware works. Rather than viewing computational progress as the consumption of tokens and the firing of enabled instructions, we instead reason about the evolution of multiple, interacting sequential threads, where forking and joining are extremely efficient. Because this new paradigm has proven so valuable in coding resource management operations and in improving code efficiency, it is now the cornerstone of the Monsoon instruction set architecture and macro assembly language. In retrospect, this suggests that there is a continuum of multithreaded architectures, with pure ETS dataflow and single threaded von Neumann at the extrema. We use this new perspective to better understand the relative strengths and weaknesses of the Monsoon implement ation.

[1]  B J Smith,et al.  A pipelined, shared resource MIMD computer , 1986 .

[2]  Keshav Pingali,et al.  I-structures: data structures for parallel computing , 1986, Graph Reduction.

[3]  Anant Agarwal,et al.  APRIL: a processor architecture for multiprocessing , 1990, ISCA '90.

[4]  Thomas R. Gross,et al.  Postpass Code Optimization of Pipeline Constraints , 1983, TOPL.

[5]  David E. Culler,et al.  Assessing the Benefits of Fine- Grain Parallelism in Dataflow Programs , 1988 .

[6]  Gregory M. Papadopoulos,et al.  Implementation of a general purpose dataflow multiprocessor , 1991 .

[7]  David E. Culler,et al.  Assessing the benefits of fine-grain parallelism in dataflow programs , 1988, Proceedings. SUPERCOMPUTING '88.

[8]  V. Gerald Grafe,et al.  The Epsilon-2 hybrid dataflow architecture , 1990, Digest of Papers Compcon Spring '90. Thirty-Fifth IEEE Computer Society International Conference on Intellectual Leverage.

[9]  Toshitsugu Yuba,et al.  An Architecture Of A Dataflow Single Chip Processor , 1989, The 16th Annual International Symposium on Computer Architecture.

[10]  Robert A. Iannucci Toward a dataflow/von Neumann hybrid architecture , 1988, ISCA '88.

[11]  Arvind,et al.  Two Fundamental Issues in Multiprocessing , 1987, Parallel Computing in Science and Engineering.

[12]  R. S. Nikhil Can dataflow subsume von Neumann computing? , 1989, ISCA '89.

[13]  K. Ekanadham,et al.  The price of asynchronous parallelism: an analysis of dataflow architectures , 1989 .

[14]  A. Gupta,et al.  Exploring the benefits of multiple hardware contexts in a multiprocessor architecture: preliminary results , 1989, ISCA '89.

[15]  Robert A. Iannucci,et al.  A dataflow/von Neumann hybrid architecture , 1988 .