Compiler-Controlled Multithreading for Lenient Parallel Languages

Tolerance to communication latency and inexpensive synchronization are critical for general-purpose computing on large multiprocessors. Fast dynamic scheduling is required for powerful non-strict parallel languages. However, machines that support rapid switching between multiple execution threads remain a design challenge. This paper explores how multithreaded execution can be addressed as a compilation problem, to achieve switching rates approaching what hardware mechanisms might provide. Compiler-controlled multithreading is examined through compilation of a lenient parallel language, Id90, for a threaded abstract machine, TAM. A key feature of TAM is that synchronization is explicit and occurs only at the start of a thread, so that a simple cost model can be applied. A scheduling hierarchy allows the compiler to schedule logically related threads closely together in time and to use registers across threads. Remote communication is via message sends and split-phase memory accesses. Messages and memory replies are received by compiler-generated message handlers which rapidly integrate these events with thread scheduling. To compile Id90for TAM, we employ a new parallel intermediate form, dual-graphs, with distinct control and data arcs. This provides a clean framework for partitioning the program into threads, scheduling threads, and managing registers under asynchronous execution. The compilation process is described and preliminary measurements show that the cost of compiler-controlled multithreading is within a small factor of the cost of control flow in sequential languages.

[1]  V. Gerald Grafe,et al.  The Epsilon-2 hybrid dataflow architecture , 1990, Digest of Papers Compcon Spring '90. Thirty-Fifth IEEE Computer Society International Conference on Intellectual Leverage.

[2]  K. R. Traub,et al.  Sequential implementation of lenient programming languages , 1988 .

[3]  Robert H. Halstead,et al.  MASA: a multithreaded processor architecture for parallel symbolic computing , 1988, [1988] The 15th Annual International Symposium on Computer Architecture. Conference Proceedings.

[4]  W. H. Mac Williams Keynote address , 2006, AIEE-IRE '51.

[5]  Kenneth R. Traub,et al.  Multithreading: a revisionist view of dataflow architectures , 1991, ISCA '91.

[6]  K. R. Traub,et al.  A COMPILER FOR THE MIT TAGGED-TOKEN DATAFLOW ARCHITECTURE , 1986 .

[7]  R. S. Nikhil Can dataflow subsume von Neumann computing? , 1989, ISCA '89.

[8]  Anurag Sah Parallel Language Support on Shared Memory Multiprocessors , 1991 .

[9]  Robert H. Halstead,et al.  MULTILISP: a language for concurrent symbolic computation , 1985, TOPL.

[10]  David E. Culler,et al.  Assessing the Benefits of Fine- Grain Parallelism in Dataflow Programs , 1988 .

[11]  Patrick J. Burns,et al.  Vectorization of Monte Carlo particle transport , 1989 .

[12]  Toshitsugu Yuba,et al.  An Architecture Of A Dataflow Single Chip Processor , 1989, The 16th Annual International Symposium on Computer Architecture.

[13]  Arvind,et al.  Two Fundamental Issues in Multiprocessing , 1987, Parallel Computing in Science and Engineering.

[14]  Mark N. Wegman,et al.  An efficient method of computing static single assignment form , 1989, POPL '89.

[15]  Anant Agarwal,et al.  APRIL: a processor architecture for multiprocessing , 1990, ISCA '90.

[16]  Arvind,et al.  Future Scientific Programming on Parallel Machines , 1988, J. Parallel Distributed Comput..

[17]  Rishiyur S. Nikhil,et al.  The Parallel Programming Language Id and its Compilation for Parallel Machines , 1993, Int. J. High Speed Comput..

[18]  David E. Culler,et al.  Managing parallelism and resources in scientific dataflow programs , 1989 .

[19]  Allan Porterfield,et al.  The Tera computer system , 1990, ICS '90.

[20]  Patrick J. Burns,et al.  Vectorization on Monte Carlo particle transport: an architectural study using the LANL benchmark “GAMTEB” , 1989, Proceedings of the 1989 ACM/IEEE Conference on Supercomputing (Supercomputing '89).

[21]  Arvind,et al.  Future Scientific Programming on Parallel Machines , 1988, J. Parallel Distributed Comput..

[22]  David E. Culler,et al.  Assessing the benefits of fine-grain parallelism in dataflow programs , 1988, Proceedings. SUPERCOMPUTING '88.

[23]  Keshav Pingali,et al.  I-structures: data structures for parallel computing , 1986, Graph Reduction.

[24]  T. Yuba,et al.  An architecture of a dataflow single chip processor , 1989, ISCA '89.

[25]  David E. Culler,et al.  Fine-grain parallelism with minimal hardware support: a compiler-controlled threaded abstract machine , 1991, ASPLOS IV.

[26]  Robert A. Iannucci Toward a dataflow/von Neumann hybrid architecture , 1988, ISCA '88.

[27]  Arvind,et al.  Programming Generality and Parallel Computers , 1988 .