High Bandwidth On-Chip Cache Design

In this paper, we evaluate the performance of high bandwidth cache organizations employing multiple cache ports, multiple cycle hit times, and cache port efficiency enhancements, such as load all and line buffer, to find the organization that provides the best processor performance. Using a dynamic superscalar processor running realistic benchmarks that include operating system references, we use execution time to measure processor performance. When the cache is limited to a single cache port without enhancements, we find that two cache ports increase processor performance by 25 percent. With the addition of line buffer and load all to a single pelted cache, the processor achieves 91 percent of the performance of the same processor containing a cache with two ports. When the processor is not limited to a single cache port, the results show that a large dual-ported multicycle pipelined SRAM cache with a line buffer maximizes processor performance. A large pipelined cache provides both a low miss rate and a high CPU clock frequency. Dual-porting the cache and using a line buffer provide the bandwidth needed by a dynamic superscalar processor. The line buffer makes the pipelined dual-ported cache the best option by increasing cache port bandwidth and hiding cache latency.

[1]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[2]  Norman P. Jouppi,et al.  Improving direct-mapped cache performance by the addition of a small fully-associative cache and pre , 1990, ISCA 1990.

[3]  Gurindar S. Sohi,et al.  High-bandwidth data memory systems for superscalar processors , 1991, ASPLOS IV.

[4]  Trevor Mudge,et al.  Performance optimization of pipelined primary cache , 1992, ISCA '92.

[5]  Jim Gray,et al.  Benchmark Handbook: For Database and Transaction Processing Systems , 1992 .

[6]  Kemal Ebcioglu,et al.  A study on the number of memory ports in multiple instruction issue machines , 1993, MICRO 1993.

[7]  Scott Mahlke,et al.  Speculative execution exception recovery using write-back suppression , 1993, MICRO 1993.

[8]  Anoop Gupta,et al.  The Stanford FLASH Multiprocessor , 1994, ISCA.

[9]  Norman P. Jouppi,et al.  Complexity/performance tradeoffs with non-blocking loads , 1994, ISCA '94.

[10]  Norman P. Jouppi,et al.  Tradeoffs in two-level on-chip caching , 1994, ISCA '94.

[11]  Norman P. Jouppi,et al.  WRL Research Report 93/5: An Enhanced Access and Cycle Time Model for On-chip Caches , 1994 .

[12]  Michael J. Flynn,et al.  Performance Factors for Superscalar Processors , 1995 .

[13]  Pascal Sainrat,et al.  Exploring configurations of functional units in an out-of-order superscalar processor , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.

[14]  Kunle Olukotun,et al.  High Performance Cache Architectures to Support Dynamic Superscalar Microprocessors , 1995 .

[15]  Anoop Gupta,et al.  Complete computer system simulation: the SimOS approach , 1995, IEEE Parallel Distributed Technol. Syst. Appl..

[16]  Ruben W. Castelino,et al.  Internal Organization of the Alpha 21164, a 300-MHz 64-bit Quad-issue CMOS RISC Microprocessor , 1995, Digit. Tech. J..

[17]  Norman P. Jouppi,et al.  CACTI: an enhanced cache access and cycle time model , 1996, IEEE J. Solid State Circuits.

[18]  Kenneth M. Wilson,et al.  Increasing Cache Port Efficiency for Dynamic Superscalar Microprocessors , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).

[19]  David A. Patterson,et al.  Computer architecture (2nd ed.): a quantitative approach , 1996 .

[20]  Kenneth C. Yeager The Mips R10000 superscalar microprocessor , 1996, IEEE Micro.

[21]  Fong Pong,et al.  Missing the Memory Wall: The Case for Processor/Memory Integration , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).

[22]  N. Okumura,et al.  A multimedia 32 b RISC microprocessor with 16 Mb DRAM , 1996, 1996 IEEE International Solid-State Circuits Conference. Digest of TEchnical Papers, ISSCC.

[23]  Kenneth M. Wilson,et al.  Designing High Bandwidth On-chip Caches , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[24]  Kenneth M. Wilson High bandwidth cache design for superscalar processors , 1998 .