论文信息 - Increasing Cache Port Efficiency for Dynamic Superscalar Microprocessors

Increasing Cache Port Efficiency for Dynamic Superscalar Microprocessors

The memory bandwidth demands of modern microprocessors require the use of a multi-ported cache to achieve peak performance. However, multi-ported caches are costly to implement. In this paper we propose techniques for improving the bandwidth of a single cache port by using additional buffering in the processor, and by taking maximum advantage of a wider cache port. We evaluate these techniques using realistic applications that include the operating system. Our techniques using a single-ported cache achieve 91% of the performance of a dual-ported cache.

[1] Norman P. Jouppi,et al. Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.

[2] Jeff Yetter,et al. Performance features of the PA7100 microprocessor , 1993, IEEE Micro.

[3] Thomas M. Conte. Tradeoffs in processor/memory interfaces for superscalar processors , 1992, MICRO 1992.

[4] Gurindar S. Sohi,et al. High-bandwidth data memory systems for superscalar processors , 1991, ASPLOS IV.

[5] Norman P. Jouppi,et al. WRL Research Report 93/5: An Enhanced Access and Cycle Time Model for On-chip Caches , 1994 .

[6] Chung-Ho Chen,et al. A unified architectural tradeoff methodology , 1994, ISCA '94.

[7] Trevor Mudge,et al. Performance optimization of pipelined primary cache , 1992, ISCA '92.

[8] Trevor N. Mudge,et al. Resource allocation in a high clock rate microprocessor , 1994, ASPLOS VI.

[9] Jean-Loup Baer,et al. Reducing memory latency via non-blocking and prefetching caches , 1992, ASPLOS V.

[10] Ann Marie Grizzaffi Maynard,et al. Contrasting characteristics and cache performance of technical and multi-user commercial workloads , 1994, ASPLOS VI.

[11] Gary S. Tyson,et al. A study of single-chip processor/cache organizations for large numbers of transistors , 1994, ISCA '94.

[12] Mark Horowitz,et al. Performance tradeoffs in cache design , 1988, ISCA '88.

[13] Jim Gray,et al. Benchmark Handbook: For Database and Transaction Processing Systems , 1992 .

[14] David A. Patterson,et al. Computer Architecture: A Quantitative Approach , 1969 .

[15] Mike Johnson,et al. Superscalar microprocessor design , 1991, Prentice Hall series in innovative technology.

[16] Anoop Gupta,et al. The Stanford FLASH Multiprocessor , 1994, ISCA.

[17] Norman P. Jouppi,et al. Complexity/performance tradeoffs with non-blocking loads , 1994, ISCA '94.

[18] Norman P. Jouppi,et al. Improving direct-mapped cache performance by the addition of a small fully-associative cache and pre , 1990, ISCA 1990.

[19] Kunle Olukotun,et al. Performance Optimization of Pipelined Primary Caches , 1992, [1992] Proceedings the 19th Annual International Symposium on Computer Architecture.

[20] Anoop Gupta,et al. The impact of architectural trends on operating system performance , 1995, SOSP.

[21] R. M. Tomasulo,et al. An efficient algorithm for exploiting multiple arithmetic units , 1995 .

[22] Thomas M. Conte. Tradeoffs in processor/memory interfaces for superscalar processors , 1992, MICRO.

[23] Zarka Cvetanovic,et al. Characterization of Alpha AXP performance using TP and SPEC workloads , 1994, Proceedings of 21 International Symposium on Computer Architecture.

[24] Norman P. Jouppi. Cache write policies and performance , 1993, ISCA '93.

[25] David W. Wall,et al. Limits of instruction-level parallelism , 1991, ASPLOS IV.

[26] Dionisios N. Pnevmatikatos,et al. Cache performance of the SPEC92 benchmark suite , 1993, IEEE Micro.

[27] David Kroft,et al. Lockup-free instruction fetch/prefetch cache organization , 1998, ISCA '81.

[28] Edward McLellan. The Alpha AXP architecture and 21064 processor , 1993, IEEE Micro.

[29] Mendel Rosenblum,et al. Embra: fast and flexible machine simulation , 1996, SIGMETRICS '96.

[30] Rajiv V. Joshi,et al. A 2-ns cycle, 3.8-ns access 512-kb CMOS ECL SRAM with a fully pipelined architecture , 1991 .

[31] Michael J. Flynn,et al. Performance Factors for Superscalar Processors , 1995 .

[32] Anoop Gupta,et al. Complete computer system simulation: the SimOS approach , 1995, IEEE Parallel Distributed Technol. Syst. Appl..