Multithreading: Fundamental Limits, Potential Gains, and Alternatives

Multithreading as a means of tolerating latency, enabling powerful parallel languages, and exposing parallelism is critically examined in order to identify its fundamental limits and potential gains. A simple analytical model shows how the performance gain due to multithreading is related to switch cost, remote reference frequency, and outstanding message capacity. Examination of current networks shows that they support only limited multithreading, due to overhead, channel, and volumetric constraints. Compiler-controlled multithreading is proposed as an alternative to hardware multithreading to make effective use of the processor with a limited number of communication threads. The approach is illustrated by a simple parallel language, Split-C, with split-phase remote references and a novel compilation methodology, TAM, for powerful parallel languages which require dynamic scheduling of a large number of threads.

[1]  David E. Culler,et al.  Analysis of multithreaded architectures for parallel computing , 1990, SPAA '90.

[2]  David E. Culler,et al.  Managing parallelism and resources in scientific dataflow programs , 1989 .

[3]  David E. Culler,et al.  Dataflow architectures , 1986 .

[4]  R. S. Nikhil Can dataflow subsume von Neumann computing? , 1989, ISCA '89.

[5]  A. Gupta,et al.  Exploring the benefits of multiple hardware contexts in a multiprocessor architecture: preliminary results , 1989, ISCA '89.

[6]  Kenji Nishida,et al.  Maintenance Architecture and Its LSI Implementation of a Dataflow Computer with a Large Number of Processors , 1986, ICPP.

[7]  Burton J. Smith,et al.  A processor architecture for Horizon , 1988, Proceedings. SUPERCOMPUTING '88.

[8]  V. Gerald Grafe,et al.  The Epsilon-2 hybrid dataflow architecture , 1990, Digest of Papers Compcon Spring '90. Thirty-Fifth IEEE Computer Society International Conference on Intellectual Leverage.

[9]  Simon L. Peyton Jones,et al.  GRIP - A high-performance architecture for parallel graph reduction , 1987, FPCA.

[10]  David E. Culler,et al.  Resource requirements of dataflow programs , 1988, [1988] The 15th Annual International Symposium on Computer Architecture. Conference Proceedings.

[11]  Robert A. Iannucci Toward a dataflow/von Neumann hybrid architecture , 1988, ISCA '88.

[12]  Robert H. Halstead,et al.  MASA: a multithreaded processor architecture for parallel symbolic computing , 1988, [1988] The 15th Annual International Symposium on Computer Architecture. Conference Proceedings.

[13]  T. Yuba,et al.  An architecture of a dataflow single chip processor , 1989, ISCA '89.

[14]  Andrew A. Chien,et al.  J-machine: A fine-grain concurrent computer , 1989 .

[15]  David E. Culler,et al.  Compiler-Controlled Multithreading for Lenient Parallel Languages , 1991, FPCA.

[16]  Robert H. Halstead,et al.  MULTILISP: a language for concurrent symbolic computation , 1985, TOPL.

[17]  David E. Culler,et al.  Global analysis for partitioning non-strict programs into sequential threads , 1992, LFP '92.

[18]  Jack B. Dennis,et al.  Data Flow Supercomputers , 1980, Computer.

[19]  David E. Culler,et al.  Analysis of multithreaded microprocessors under multiprogramming , 1992, ISCA '92.

[20]  Anant Agarwal,et al.  Limits on Interconnection Network Performance , 1991, IEEE Trans. Parallel Distributed Syst..

[21]  Andrew A. Chien,et al.  Experience with CST: programming and implementation , 1989, PLDI '89.

[22]  Ian Watson,et al.  The Manchester prototype dataflow computer , 1985, CACM.

[23]  Arvind,et al.  Future Scientific Programming on Parallel Machines , 1988, J. Parallel Distributed Comput..

[24]  David E. Culler,et al.  Fine-grain parallelism with minimal hardware support: a compiler-controlled threaded abstract machine , 1991, ASPLOS IV.

[25]  Rishiyur S. Nikhil The Parallel Programming Language Id and its Compilation for Parallel Machines , 1993, Int. J. High Speed Comput..

[26]  Arvind,et al.  Two Fundamental Issues in Multiprocessing , 1987, Parallel Computing in Science and Engineering.

[27]  Arvind,et al.  T: a multithreaded massively parallel architecture , 1992, ISCA '92.

[28]  Allan Porterfield,et al.  The Tera computer system , 1990 .