论文信息 - Execution And Cache Performance Of A Decoupled Non-Blocking Multithreaded Architecture

Execution And Cache Performance Of A Decoupled Non-Blocking Multithreaded Architecture

In this paper we will present an evaluation of the execution performance and cache behavior of a new multithreaded architecture being investigated by the authors. Our architecture uses non-blocking multithreaded model based on dataflow paradigm. In addition, all memory accesses are decoupled from the thread execution. Data is pre-loaded into the thread context (registers), and all results are post-stored after the completion of the thread execution. The decoupling of memory accesses from thread execution requires a separate unit to perform the necessary pre-loads and post-stores, and to control the allocation of hardware thread contexts to the enabled threads. The non-blocking nature of threads reduces the number of context switches, thus reducing the overhead in scheduling threads. Our functional execution paradigm eliminates complex hardware required for dynamic scheduling of instructions used in modern superscalar architectures. We will present our preliminary results obtained from an instruction set simulator using several benchmark programs. We compare the execution and cache performance of our architecture with that of MIPS architecture as facilitated by DLX simulator.

[1] Krishna M. Kavi,et al. Design of cache memories for dataflow architecture , 1998, J. Syst. Archit..

[2] John Feo,et al. SISAL reference manual. Language version 2.0 , 1990 .

[3] Mario Tokoro,et al. On the working set concept for data-flow machines , 1983, ISCA '83.

[4] Krishna M. Kavi,et al. Scheduled dataflow architecture : A synchronous execution paradigm for dataflow , 1999 .

[5] James E. Smith,et al. Decoupled access/execute computer architectures , 1984, TOCS.

[6] Trevor N. Mudge,et al. A performance comparison of contemporary DRAM architectures , 1999, ISCA.

[7] Guang R. Gao,et al. A design study of the EARTH multiprocessor , 1995, PACT.

[8] R. Govindarajan,et al. Design and performance evaluation of a multithreaded architecture , 1995, Proceedings of 1995 1st IEEE Symposium on High Performance Computer Architecture.

[9] David E. Culler,et al. Monsoon: an explicit token-store architecture , 1998, ISCA '98.

[10] Monica S. Lam,et al. Limits of control flow on parallelism , 1992, ISCA '92.

[11] Josep Torrellas,et al. A Chip-Multiprocessor Architecture with Speculative Multithreading , 1999, IEEE Trans. Computers.

[12] David A. Patterson,et al. Computer Architecture: A Quantitative Approach , 1969 .

[13] Sharilyn A. Thoreson,et al. A Feasibility Study of a Memory Hierarchy in a Data Flow Environment , 1985, ICPP.

[14] Krishna M. Kavi,et al. A decoupled scheduled dataflow multithreaded architecture , 1999, Proceedings Fourth International Symposium on Parallel Architectures, Algorithms, and Networks (I-SPAN'99).

[15] Jian Huang,et al. The Superthreaded Processor Architecture , 1999, IEEE Trans. Computers.

[16] Kenneth R. Traub,et al. Multithreading: a revisionist view of dataflow architectures , 1991, ISCA '91.

[17] David W. Wall,et al. Limits of instruction-level parallelism , 1991, ASPLOS IV.

[18] Masaru Takesue. A unified resource management and execution control mechanism for data flow machines , 1987, ISCA '87.

[19] Donald Yeung,et al. Sparcle: an evolutionary processor design for large-scale multiprocessors , 1993, IEEE Micro.

[20] Michael Shebanow,et al. Single instruction stream parallelism is greater than two , 1991, ISCA '91.

[21] Fong Pong,et al. Missing the Memory Wall: The Case for Processor/Memory Integration , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).

[22] Mitsuhisa Sato,et al. Super-threading: architectural and software mechanisms for optimizing parallel computation , 1993, ICS '93.

[23] Krishna M. Kavi,et al. Design of cache memories for multi-threaded dataflow architecture , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.

[24] Susan J. Eggers,et al. The effectiveness of multiple hardware contexts , 1994, ASPLOS VI.