Resource Management in Dataflow-Based Multithreaded Execution

Due to the large amount of potential parallelism, resource management is a critical issue in multithreaded execution. The challenge in code generation is to control the parallelism without reducing the machine's ability to exploit it. Controlled parallelism reduces idle time, communication, and delay caused by synchronization. At the same time it increases the potential for exploitation of program data structure locality. In this paper, we evaluate the performance of methods to control program parallelism and resource usage in the context of the fine-grain dataflow execution model. The methods are in themselves not new, but their performance analysis is. The two methods to control parallelism here are slicing and chunking. We present the methods and their compilation strategy and evaluate their effectiveness in terms of run time and matching store occupancy. Communication is categorized in memory, loop, call, and expression communication. Input and output message locality is measured. Two techniques to reduce communication are introduced. Grouping allocates loop and function bodies on one processor and bundling combines messages with the same sender and receiver into one. Their effects on the total communication volume are quantified.

[1]  Robert A. Iannucci Toward a dataflow/von Neumann hybrid architecture , 1988, ISCA '88.

[2]  David E. Culler,et al.  Managing parallelism and resources in scientific dataflow programs , 1989 .

[3]  Walid A. Najjar,et al.  An evaluation of bottom-up and top-down thread generation techniques , 1993, Proceedings of the 26th Annual International Symposium on Microarchitecture.

[4]  Burton J. Smith Architecture And Applications Of The HEP Multiprocessor Computer System , 1982, Optics & Photonics.

[5]  Kirk L. Johnson The impact of communication locality on large-scale multiprocessor performance , 1992, ISCA '92.

[6]  David F. Snelling,et al.  The Design and Analysis of a Stateless Data-Flow Architecture , 1993 .

[7]  John Sargeant,et al.  Control of parallelism in the Manchester Dataflow Machine , 1987, FPCA.

[8]  Allan Porterfield,et al.  The Tera computer system , 1990 .

[9]  Jean-Luc Gaudiot,et al.  Advanced Topics in Data-Flow Computing , 1991 .

[10]  William J. Dally,et al.  The message-driven processor: a multicomputer processing node with efficient mechanisms , 1992, IEEE Micro.

[11]  Gregory R. Andrews,et al.  Distributed filaments: efficient fine-grain parallelism on a cluster of workstations , 1994, OSDI '94.

[12]  David E. Culler,et al.  Compiler-Controlled Multithreading for Lenient Parallel Languages , 1991, FPCA.

[13]  John R. Gurd,et al.  Self-regulation of workload in the Manchester Data-Flow computer , 1995, Proceedings of the 28th Annual International Symposium on Microarchitecture.

[14]  T. Yuba,et al.  An architecture of a dataflow single chip processor , 1989, ISCA '89.

[15]  Walid A. Najjar,et al.  Analysis of communications and overhead reduction in multithreaded execution , 1995, PACT.

[16]  John R. Gurd,et al.  Self-regulation of workload in the Manchester Data-Flow computer , 1995, MICRO 1995.

[17]  Walid A. Najjar,et al.  An Evaluation of Optimized Threaded Code Generation , 1994, IFIP PACT.

[18]  Guang R. Gao,et al.  A design study of the EARTH multiprocessor , 1995, PACT.

[19]  David E. Culler,et al.  Monsoon: an explicit token-store architecture , 1998, ISCA '98.

[20]  William J. Dally,et al.  The M-machine multicomputer , 1997, Proceedings of the 28th Annual International Symposium on Microarchitecture.

[21]  John Glauert,et al.  SISAL: streams and iteration in a single assignment language. Language reference manual, Version 1. 2. Revision 1 , 1985 .

[22]  David E. Culler,et al.  Two Fundamental Limits on Dataflow Multiprocessing , 1993, Architectures and Compilation Techniques for Fine and Medium Grain Parallelism.

[23]  David E. Culler,et al.  Fine-grain parallelism with minimal hardware support: a compiler-controlled threaded abstract machine , 1991, ASPLOS IV.

[24]  Derek Chiou,et al.  Performance Studies of Id on the Monsoon Dataflow System , 1993, J. Parallel Distributed Comput..

[25]  Walid A. Najjar,et al.  Top-Down Thread Generation for Sisal , 1993 .

[26]  Walid A. Najjar,et al.  Generation and quantitative evaluation of dataflow clusters , 1993, FPCA '93.

[27]  D. E. Culler,et al.  RESOURCE MANAGEMENT FOR THE TAGGED TOKEN DATAFLOW ARCHITECTURE , 1985 .

[28]  Arvind,et al.  T: a multithreaded massively parallel architecture , 1992, ISCA '92.

[29]  Walid A. Najjar,et al.  An analysis of loop latency in dataflow execution , 1992, ISCA '92.

[30]  Walid A. Najjar,et al.  Control of loop parallelism in multithreaded code , 1995, PACT.

[31]  Walid A. Najjar,et al.  An evaluation of bottom-up and top-down thread generation techniques , 1993, MICRO 1993.