Fine-grain parallelism with minimal hardware support: a compiler-controlled threaded abstract machine

In this paper, we present a relatively primitive execution model for fine-grain parallelism, in which all synchronization, scheduling, and storage management is explicit and under compiler control. This is defined by a threaded abstract machine (TAM) with a multilevel scheduling hierarchy. Considerable temporal locality of logically related threads is demonstrated, providing an avenue for effective register use under quasidynamic scheduling. A prototype TAM instruction set, TLO, has been developed, along with a translator to a variety of existing sequential and parallel machines. Compilation of Id, an extended functional language requiring fine-grain synchronization, under this model yields performance approaching that of conventional languages on current uniprocessors. Measurements suggest that the net cost of synchronization on conventional multiprocessors can be reduced to within a small factor of that on machines with elaborate hardware support, such aa proposed dataflow architectures. This brings into question whether tolerance to latency and inexpensive synchronization require specific hardware support or merely an appropriate compilation strategy and program representation.

[1]  David E. Culler,et al.  Monsoon: an explicit token-store architecture , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.

[2]  Anoop Gupta,et al.  The directory-based cache coherence protocol for the DASH multiprocessor , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.

[3]  David E. Culler,et al.  Analysis of multithreaded architectures for parallel computing , 1990, SPAA '90.

[4]  Anant Agarwal,et al.  APRIL: a processor architecture for multiprocessing , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.

[5]  V. Gerald Grafe,et al.  The Epsilon-2 hybrid dataflow architecture , 1990, Digest of Papers Compcon Spring '90. Thirty-Fifth IEEE Computer Society International Conference on Intellectual Leverage.

[6]  V. G. Grafe,et al.  The Epsilon dataflow processor , 1989, ISCA '89.

[7]  David E. Culler,et al.  Managing parallelism and resources in scientific dataflow programs , 1989 .

[8]  Burton J. Smith,et al.  A processor architecture for Horizon , 1988, Proceedings. SUPERCOMPUTING '88.

[9]  K. R. Traub,et al.  Sequential implementation of lenient programming languages , 1988 .

[10]  David E. Culler,et al.  Assessing the Benefits of Fine- Grain Parallelism in Dataflow Programs , 1988 .

[11]  Robert H. Halstead,et al.  MASA: a multithreaded processor architecture for parallel symbolic computing , 1988, [1988] The 15th Annual International Symposium on Computer Architecture. Conference Proceedings.

[12]  David E. Culler,et al.  Resource requirements of dataflow programs , 1988, [1988] The 15th Annual International Symposium on Computer Architecture. Conference Proceedings.

[13]  Bob Iannucci Toward a dataflow/von Neumann hybrid architecture , 1988, [1988] The 15th Annual International Symposium on Computer Architecture. Conference Proceedings.

[14]  Arvind,et al.  Future Scientific Programming on Parallel Machines , 1988, J. Parallel Distributed Comput..

[15]  Arvind,et al.  Two Fundamental Issues in Multiprocessing , 1987, Parallel Computing in Science and Engineering.

[16]  Arvind,et al.  Executing a Program on the MIT Tagged-Token Dataflow Architecture , 1987, IEEE Trans. Computers.

[17]  Carlos A. Ruggiero,et al.  Throttle mechanisms for the manchester dataflow machine , 1987 .

[18]  Andrew A. Chien,et al.  Architecture of a message-driven processor , 1987, ISCA '87.

[19]  Keshav Pingali,et al.  I-structures: data structures for parallel computing , 1986, Graph Reduction.

[20]  Kenji Nishida,et al.  Maintenance Architecture and Its LSI Implementation of a Dataflow Computer with a Large Number of Processors , 1986, ICPP.

[21]  Robert H. Halstead,et al.  MULTILISP: a language for concurrent symbolic computation , 1985, TOPL.

[22]  Ian Watson,et al.  The Manchester prototype dataflow computer , 1985, CACM.

[23]  Harry F. Jordan Performance measurements on HEP - a pipelined MIMD computer , 1983, ISCA '83.

[24]  John Cocke,et al.  A methodology for the real world , 1981 .

[25]  John Cocke,et al.  Register Allocation Via Coloring , 1981, Comput. Lang..

[26]  R. E. Grench COLLECTED ALGORITHMS 1960-1963 FROM THE COMMUNICATIONS OF THE ASSOCIATION FOR COMPUTING MACHINERY , 1965 .

[27]  W. H. Mac Williams Keynote address , 2006, AIEE-IRE '51.