论文信息 - Performance of shared cache on multithreaded architectures

Performance of shared cache on multithreaded architectures

Uses a trace-driven simulation technique to study the performance impact on the storage hierarchy system in a multithreaded execution environment. Particularly, we examine the effects of different multithread scheduling techniques on cache performance using several program traces representing a typical server/workstation workload mix. An MRU (most recently used) priority scheduling scheme is proposed as the baseline scheduling scheme to study the performance effects. We found that the cache performance can be improved over the traditional round-robin scheduling method when the thread with the MRU hit is given a higher priority. With a direct-map cache, the absolute hit ratio can be improved by 7% more than the original ratio. We also studied the performance effects on cache memory with a varying number of concurrent threads. The results showed that both the cache size and the set associativity need to increase according to the number of threads, in order to maintain a comparable cache performance. The main contribution of this paper is to provide a performance comparison between two simple schemes which are easy to implement with the proposed baseline scheme.

[1] Arvind,et al. T: a multithreaded massively parallel architecture , 1992, ISCA '92.

[2] Arvind,et al. Tagged token dataflow architecture , 1983 .

[3] Robert H. Halstead,et al. MASA: a multithreaded processor architecture for parallel symbolic computing , 1988, [1988] The 15th Annual International Symposium on Computer Architecture. Conference Proceedings.

[4] Anoop Gupta,et al. Architectural and implementation issues for multithreading (panel session I) , 1994, CARN.

[5] Anant Agarwal,et al. Performance Tradeoffs in Multithreaded Processors , 1992, IEEE Trans. Parallel Distributed Syst..

[6] Burton J. Smith. Architecture And Applications Of The HEP Multiprocessor Computer System , 1982, Optics & Photonics.

[7] Allan Porterfield,et al. The Tera computer system , 1990 .

[8] Andrew A. Chien,et al. Architecture of a message-driven processor , 1987, ISCA '98.

[9] Robert A. Iannucci. Toward a dataflow/von Neumann hybrid architecture , 1988, ISCA '88.

[10] Tetsuya Fujita,et al. A Multithreaded Processor Architecture for Parallel Symbolic Computation. , 1987 .

[11] Jack B. Dennis,et al. Data Flow Supercomputers , 1980, Computer.

[12] Donald Yeung,et al. THE MIT ALEWIFE MACHINE: A LARGE-SCALE DISTRIBUTED-MEMORY MULTIPROCESSOR , 1991 .

[13] William H. Press,et al. Numerical recipes , 1990 .

[14] R. S. Nikhil. Can dataflow subsume von Neumann computing? , 1989, ISCA '89.

[15] Weng-Fai Wong,et al. Effects of Multiple Instruction Stream Execution on Cache Performance , 1991, Int. J. High Speed Comput..

[16] David E. Culler,et al. Analysis of multithreaded architectures for parallel computing , 1990, SPAA '90.

[17] Alan Jay Smith,et al. Cache Memories , 1982, CSUR.