Design and performance evaluation of a multithreaded architecture

Multithreaded architectures have the ability to tolerate long memory latencies and unpredictable synchronization delays. We propose a multithreaded architecture that is capable of exploiting both coarse-grain parallelism, and fine-grain instruction level parallelism in a program. Instruction-level parallelism is exploited by grouping instructions from a number of active threads at runtime. The architecture supports multiple resident activations to improve the extent of locality exploited. Further, a distributed data structure cache organization is proposed to reduce both the network: traffic and the latency in accessing remote locations. Initial performance evaluation using discrete-event simulation indicates that the architecture is capable of achieving very high processor throughput. The introduction of the data structure cache reduces the network latency significantly. The impact of various cache organizations on the performance of the architecture is also discussed in this paper.<<ETX>>

[1]  Robert H. Halstead,et al.  Multithreaded Computer Architecture , 1994, The Kluwer International Series in Engineering and Computer Science.

[2]  William J. Dally,et al.  Processor coupling: integrating compile time and runtime scheduling for parallelism , 1992, ISCA '92.

[3]  Kozo Kimura,et al.  An Elementary Processor Architecture with Simultaneous Instruction Issuing from Multiple Threads , 1992, [1992] Proceedings the 19th Annual International Symposium on Computer Architecture.

[4]  Keshav Pingali,et al.  I-structures: Data structures for parallel computing , 1986, Graph Reduction.

[5]  William J. Dally A Multicomputer Processing Node with Efficient Mechanisms , 1992 .

[6]  Arvind,et al.  T: A Multithreaded Massively Parallel Architecture , 1992, [1992] Proceedings the 19th Annual International Symposium on Computer Architecture.

[7]  David E. Culler,et al.  Fine-grain parallelism with minimal hardware support: a compiler-controlled threaded abstract machine , 1991, ASPLOS IV.

[8]  Bob Iannucci Toward a dataflow/von Neumann hybrid architecture , 1988, [1988] The 15th Annual International Symposium on Computer Architecture. Conference Proceedings.

[9]  Robert A. Iannucci,et al.  Editors: Multithreaded computer architecture : A summary of the state of the art , 1994 .

[10]  Allan Porterfield,et al.  The Tera computer system , 1990 .

[11]  Anant Agarwal,et al.  APRIL: a processor architecture for multiprocessing , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.

[12]  D. Steiss,et al.  A three-million-transistor microprocessor , 1992, 1992 IEEE International Solid-State Circuits Conference Digest of Technical Papers.

[13]  Shuichi Sakai,et al.  A prototype of a highly parallel dataflow machine EM-4 and its preliminary evaluation , 1992, Future Gener. Comput. Syst..

[14]  David E. Culler,et al.  Monsoon: an explicit token-store architecture , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.

[15]  David E. Culler,et al.  Fine-grain parallelism with minimal hardware support: a compiler-controlled threaded abstract machine , 1991, ASPLOS IV.

[16]  R. Govindarajan,et al.  SMALL: a scalable multithreaded architecture to exploit large locality , 1992, [1992] Proceedings of the Fourth IEEE Symposium on Parallel and Distributed Processing.

[17]  Alexander V. Veidenbaum,et al.  Compiler-directed cache management in multiprocessors , 1990, Computer.