论文信息 - Compiler-Controlled Multithreading for Lenient Parallel Languages

Compiler-Controlled Multithreading for Lenient Parallel Languages

Tolerance to communication latency and inexpensive synchronization are critical for general-purpose computing on large multiprocessors. Fast dynamic scheduling is required for powerful non-strict parallel languages. However, machines that support rapid switching between multiple execution threads remain a design challenge. This paper explores how multithreaded execution can be addressed as a compilation problem, to achieve switching rates approaching what hardware mechanisms might provide. Compiler-controlled multithreading is examined through compilation of a lenient parallel language, Id90, for a threaded abstract machine, TAM. A key feature of TAM is that synchronization is explicit and occurs only at the start of a thread, so that a simple cost model can be applied. A scheduling hierarchy allows the compiler to schedule logically related threads closely together in time and to use registers across threads. Remote communication is via message sends and split-phase memory accesses. Messages and memory replies are received by compiler-generated message handlers which rapidly integrate these events with thread scheduling. To compile Id90for TAM, we employ a new parallel intermediate form, dual-graphs, with distinct control and data arcs. This provides a clean framework for partitioning the program into threads, scheduling threads, and managing registers under asynchronous execution. The compilation process is described and preliminary measurements show that the cost of compiler-controlled multithreading is within a small factor of the cost of control flow in sequential languages.

[1] V. Gerald Grafe,et al. The Epsilon-2 hybrid dataflow architecture , 1990, Digest of Papers Compcon Spring '90. Thirty-Fifth IEEE Computer Society International Conference on Intellectual Leverage.

[2] K. R. Traub,et al. Sequential implementation of lenient programming languages , 1988 .

[3] Robert H. Halstead,et al. MASA: a multithreaded processor architecture for parallel symbolic computing , 1988, [1988] The 15th Annual International Symposium on Computer Architecture. Conference Proceedings.

[4] W. H. Mac Williams. Keynote address , 2006, AIEE-IRE '51.

[5] Kenneth R. Traub,et al. Multithreading: a revisionist view of dataflow architectures , 1991, ISCA '91.

[6] K. R. Traub,et al. A COMPILER FOR THE MIT TAGGED-TOKEN DATAFLOW ARCHITECTURE , 1986 .

[7] R. S. Nikhil. Can dataflow subsume von Neumann computing? , 1989, ISCA '89.

[8] Anurag Sah. Parallel Language Support on Shared Memory Multiprocessors , 1991 .

[9] Robert H. Halstead,et al. MULTILISP: a language for concurrent symbolic computation , 1985, TOPL.

[10] David E. Culler,et al. Assessing the Benefits of Fine- Grain Parallelism in Dataflow Programs , 1988 .

[11] Patrick J. Burns,et al. Vectorization of Monte Carlo particle transport , 1989 .

[12] Toshitsugu Yuba,et al. An Architecture Of A Dataflow Single Chip Processor , 1989, The 16th Annual International Symposium on Computer Architecture.

[13] Arvind,et al. Two Fundamental Issues in Multiprocessing , 1987, Parallel Computing in Science and Engineering.

[14] Mark N. Wegman,et al. An efficient method of computing static single assignment form , 1989, POPL '89.

[15] Anant Agarwal,et al. APRIL: a processor architecture for multiprocessing , 1990, ISCA '90.

[16] Arvind,et al. Future Scientific Programming on Parallel Machines , 1988, J. Parallel Distributed Comput..

[17] Rishiyur S. Nikhil,et al. The Parallel Programming Language Id and its Compilation for Parallel Machines , 1993, Int. J. High Speed Comput..

[18] David E. Culler,et al. Managing parallelism and resources in scientific dataflow programs , 1989 .

[19] Allan Porterfield,et al. The Tera computer system , 1990, ICS '90.

[20] Patrick J. Burns,et al. Vectorization on Monte Carlo particle transport: an architectural study using the LANL benchmark “GAMTEB” , 1989, Proceedings of the 1989 ACM/IEEE Conference on Supercomputing (Supercomputing '89).

[21] Arvind,et al. Future Scientific Programming on Parallel Machines , 1988, J. Parallel Distributed Comput..

[22] David E. Culler,et al. Assessing the benefits of fine-grain parallelism in dataflow programs , 1988, Proceedings. SUPERCOMPUTING '88.

[23] Keshav Pingali,et al. I-structures: data structures for parallel computing , 1986, Graph Reduction.

[24] T. Yuba,et al. An architecture of a dataflow single chip processor , 1989, ISCA '89.

[25] David E. Culler,et al. Fine-grain parallelism with minimal hardware support: a compiler-controlled threaded abstract machine , 1991, ASPLOS IV.

[26] Robert A. Iannucci. Toward a dataflow/von Neumann hybrid architecture , 1988, ISCA '88.

[27] Arvind,et al. Programming Generality and Parallel Computers , 1988 .