论文信息 - Generation, Optimization, and Evaluation of Multithreaded Code

Generation, Optimization, and Evaluation of Multithreaded Code

The recent advent of multithreaded architectures holds many promises: the exploitation of intrathread locality and the latency tolerance of multithreaded synchronization can result in a more efficient processor utilization and higher scalability. The challenge for a code generation scheme is to make effective use of the underlying hardware by generating large threads with a large degree of internal locality without limiting the program level parallelism or increasing latency. Top-down code generation, where threads are created directly from the compiler's intermediate form, is effective at creating a relatively large thread. However, having only a limited view of the code at any one time limits the quality of threads generated. These top-down generated threads can therefore be optimized by global, bottom-up optimization techniques. In this paper, we introduce the Pebbles multithreaded model of computation and analyze a code generation scheme whereby top-down code generation is combined with bottom-up optimizations. We evaluate the effectiveness of this scheme in terms of overall performance and specific thread characteristics such as size, length, instruction level parallelism, number of inputs, and synchronization costs.

[1] David E. Culler,et al. Global analysis for partitioning non-strict programs into sequential threads , 1992, LFP '92.

[2] David E. Culler,et al. The Explicit Token Store , 1990, J. Parallel Distributed Comput..

[3] Toshitsugu Yuba,et al. An Architecture Of A Dataflow Single Chip Processor , 1989, The 16th Annual International Symposium on Computer Architecture.

[4] Walid A. Najjar,et al. An Evaluation of Optimized Threaded Code Generation , 1994, IFIP PACT.

[5] David E. Culler,et al. Monsoon: an explicit token-store architecture , 1998, ISCA '98.

[6] William J. Dally,et al. The message-driven processor: a multicomputer processing node with efficient mechanisms , 1992, IEEE Micro.

[7] A. Gupta,et al. Exploring the benefits of multiple hardware contexts in a multiprocessor architecture: preliminary results , 1989, ISCA '89.

[8] Allan Porterfield,et al. The Tera computer system , 1990 .

[9] David E. Culler,et al. Monsoon: an explicit token-store architecture , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.

[10] V. Gerald Grafe,et al. Compile-time partitioning of a non-strict language into sequential threads , 1991, Proceedings of the Third IEEE Symposium on Parallel and Distributed Processing.

[11] H.H.J. Hum,et al. Supporting a dynamic SPMD in a multi-threaded architecture , 1993, Digest of Papers. Compcon Spring.

[12] David E. Culler,et al. Compiler-Controlled Multithreading for Lenient Parallel Languages , 1991, FPCA.

[13] Lubomir F. Bic,et al. Automatic data/program partitioning using the single assignment principle , 1989, Proceedings of the 1989 ACM/IEEE Conference on Supercomputing (Supercomputing '89).

[14] Richard Wolski,et al. Program Partitioning for NUMA Multiprocessor Computer Systems , 1993, J. Parallel Distributed Comput..

[15] Yong Meng Teo,et al. The Effect of Iterative Instructions in Dataflow Computers , 1989, ICPP.

[16] Vivek Sarkar,et al. Partitioning and scheduling parallel programs for execution on multiprocessors , 1987 .

[17] Anoop Gupta,et al. Exploring The Benefits Of Multiple Hardware Contexts In A Multiprocessor Architecture: Preliminary Results , 1989, The 16th Annual International Symposium on Computer Architecture.

[18] John R. Rice,et al. Problems to Test Parallel and Vector Languages -- II , 1990 .

[19] Walid A. Najjar,et al. The Initial Performance of a Bottom-Up Clustering Algorithm for Dataflow Graphs , 1993, Architectures and Compilation Techniques for Fine and Medium Grain Parallelism.

[20] Seth Copen Goldstein,et al. Active Messages: A Mechanism for Integrated Communication and Computation , 1992, [1992] Proceedings the 19th Annual International Symposium on Computer Architecture.

[21] John Glauert,et al. SISAL: streams and iteration in a single assignment language. Language reference manual, Version 1. 2. Revision 1 , 1985 .

[22] David E. Culler,et al. Fine-grain parallelism with minimal hardware support: a compiler-controlled threaded abstract machine , 1991, ASPLOS IV.

[23] Walid A. Najjar,et al. Control of loop parallelism in multithreaded code , 1995, PACT.

[24] David C. Cann,et al. Compilation techniques for high-performance applicative computation , 1989 .

[25] F. H. Mcmahon,et al. The Livermore Fortran Kernels: A Computer Test of the Numerical Performance Range , 1986 .

[26] Jesper Vasell,et al. A Fine-Grain Threaded Abstract Machine , 1994, IFIP PACT.

[27] Burton J. Smith. Architecture And Applications Of The HEP Multiprocessor Computer System , 1982, Optics & Photonics.

[28] Bob Iannucci. Toward a dataflow/von Neumann hybrid architecture , 1988, [1988] The 15th Annual International Symposium on Computer Architecture. Conference Proceedings.

[29] Robert A. Iannucci,et al. Parallel Machines: Parallel Machine Languages , 1990 .

[30] Walid A. Najjar,et al. Generation and quantitative evaluation of dataflow clusters , 1993, FPCA '93.

[31] D. E. Culler,et al. RESOURCE MANAGEMENT FOR THE TAGGED TOKEN DATAFLOW ARCHITECTURE , 1985 .

[32] Walid A. Najjar,et al. An Analysis of Loop Latency in Dataflow Execution , 1992, [1992] Proceedings the 19th Annual International Symposium on Computer Architecture.

[33] Seth Copen Goldstein,et al. Active messages: a mechanism for integrating communication and computation , 1998, ISCA '98.

[34] Guang R. Gao,et al. Building multithreaded architectures with off-the-shelf microprocessors , 1994, Proceedings of 8th International Parallel Processing Symposium.

[35] Walid A. Najjar,et al. An evaluation of coarse grain dataflow code generation strategies , 1993, Proceedings of Workshop on Programming Models for Massively Parallel Computers.

[36] Kenneth R. Traub,et al. Multi-thread Code Generation for Dataflow Architectures from Non-Strict Programs , 1991, FPCA.

[37] Rishiyur S. Nikhil Arvind,et al. Id: a language with implicit parallelism , 1992 .

[38] Milind Girkar,et al. Automatic Extraction of Functional Parallelism from Ordinary Programs , 1992, IEEE Trans. Parallel Distributed Syst..

[39] Arvind,et al. T: A Multithreaded Massively Parallel Architecture , 1992, [1992] Proceedings the 19th Annual International Symposium on Computer Architecture.