Fine-grain parallelism with minimal hardware support: a compiler-controlled threaded abstract machine

In this paper, we present a relatively primitive execution model for fine-grain parallelism, in which all synchronization, scheduling, and storage management is explicit and under compiler control. This is defined by a threaded abstract machine (TAM) with a multilevel scheduling hierarchy. Considerable temporal locality of logically related threads is demonstrated, providing an avenue for effective register use under quasidynamic scheduling. A prototype TAM instruction set, TLO, has been developed, along with a translator to a variety of existing sequential and parallel machines. Compilation of Id, an extended functional language requiring fine-grain synchronization, under this model yields performance approaching that of conventional languages on current uniprocessors. Measurements suggest that the net cost of synchronization on conventional multiprocessors can be reduced to within a small factor of that on machines with elaborate hardware support, such aa proposed dataflow architectures. This brings into question whether tolerance to latency and inexpensive synchronization require specific hardware support or merely an appropriate compilation strategy and program representation.

[1]  Keshav Pingali,et al.  I-structures: data structures for parallel computing , 1986, Graph Reduction.

[2]  David E. Culler,et al.  Resource requirements of dataflow programs , 1988, [1988] The 15th Annual International Symposium on Computer Architecture. Conference Proceedings.

[3]  Keshav Pingali,et al.  I-structures: Data structures for parallel computing , 1986, Graph Reduction.

[4]  David E. Culler,et al.  Analysis of multithreaded architectures for parallel computing , 1990, SPAA '90.

[5]  T. Yuba,et al.  An architecture of a dataflow single chip processor , 1989, ISCA '89.

[6]  Burton J. Smith,et al.  A processor architecture for Horizon , 1988, Proceedings. SUPERCOMPUTING '88.

[7]  Arvind,et al.  Future Scientific Programming on Parallel Machines , 1988, J. Parallel Distributed Comput..

[8]  David E. Culler,et al.  Assessing the Benefits of Fine- Grain Parallelism in Dataflow Programs , 1988 .

[9]  V. G. Grafe,et al.  The Epsilon dataflow processor , 1989, ISCA '89.

[10]  David E. Culler,et al.  Managing parallelism and resources in scientific dataflow programs , 1989 .

[11]  A. Gupta,et al.  Exploring the benefits of multiple hardware contexts in a multiprocessor architecture: preliminary results , 1989, ISCA '89.

[12]  Anoop Gupta,et al.  The directory-based cache coherence protocol for the DASH multiprocessor , 1990, ISCA '90.

[13]  W. H. Mac Williams Keynote address , 2006, AIEE-IRE '51.

[14]  R. E. Grench COLLECTED ALGORITHMS 1960-1963 FROM THE COMMUNICATIONS OF THE ASSOCIATION FOR COMPUTING MACHINERY , 1965 .

[15]  Andrew A. Chien,et al.  Architecture of a message-driven processor , 1987, ISCA '87.

[16]  Arvind,et al.  Two Fundamental Issues in Multiprocessing , 1987, Parallel Computing in Science and Engineering.

[17]  John Cocke,et al.  A methodology for the real world , 1981 .

[18]  Ian Watson,et al.  The Manchester prototype dataflow computer , 1985, CACM.

[19]  Harry F. Jordan Performance measurements on HEP - a pipelined MIMD computer , 1983, ISCA '83.

[20]  Arvind,et al.  Future Scientific Programming on Parallel Machines , 1988, J. Parallel Distributed Comput..

[21]  Arvind,et al.  Executing a Program on the MIT Tagged-Token Dataflow Architecture , 1990, IEEE Trans. Computers.

[22]  Kenji Nishida,et al.  Maintenance Architecture and Its LSI Implementation of a Dataflow Computer with a Large Number of Processors , 1986, ICPP.

[23]  V. Gerald Grafe,et al.  The Epsilon-2 hybrid dataflow architecture , 1990, Digest of Papers Compcon Spring '90. Thirty-Fifth IEEE Computer Society International Conference on Intellectual Leverage.

[24]  Robert H. Halstead,et al.  MASA: a multithreaded processor architecture for parallel symbolic computing , 1988, [1988] The 15th Annual International Symposium on Computer Architecture. Conference Proceedings.

[25]  Robert A. Iannucci Toward a dataflow/von Neumann hybrid architecture , 1988, ISCA '88.

[26]  K. R. Traub,et al.  Sequential implementation of lenient programming languages , 1988 .

[27]  John Cocke,et al.  Register Allocation Via Coloring , 1981, Comput. Lang..

[28]  R. S. Nikhil Can dataflow subsume von Neumann computing? , 1989, ISCA '89.

[29]  Anoop Gupta,et al.  Exploring The Benefits Of Multiple Hardware Contexts In A Multiprocessor Architecture: Preliminary Results , 1989, The 16th Annual International Symposium on Computer Architecture.

[30]  Anant Agarwal,et al.  APRIL: a processor architecture for multiprocessing , 1990, ISCA '90.

[31]  Rishiyur S. Nikhil,et al.  Can Dataflow Subsume Von Neumann Computing? , 1989, The 16th Annual International Symposium on Computer Architecture.

[32]  Carlos A. Ruggiero,et al.  Throttle mechanisms for the manchester dataflow machine , 1987 .

[33]  Robert H. Halstead,et al.  MULTILISP: a language for concurrent symbolic computation , 1985, TOPL.