Aspects of cache memory and instruction buffer performance

Techniques are developed in this dissertation to efficiently evaluate direct-mapped and set-associative caches. These techniques are used to study associativity in CPU caches and examine instruction caches for single-chip RISC microprocessors. This research is motivated in general by the importance of cache memories to computer performance, and more specifically by work done to design the caches in SPUR, a multiprocessor workstation designed at U.C. Berkeley. The studies focus not only on abstract measures of performance such as miss ratios, but also include, when appropriate, detailed implementation factors, such as access times and gate delays. The simulation algorithms developed compute miss ratios for numerous alternative caches with one pass through an address trace, provided all caches have the same block size, and use demand fetching and LRU replacement. One algorithm (forest simulation) simulates direct-mapped caches by relying on inclusion, a property that all larger caches contain a superset of the data in smaller caches. The other algorithm (all associativity simulation) simulates a broader class of direct-mapped and set-associative caches than could previously be studied with a one-pass algorithm, although somewhat less efficiently than forest simulation, since inclusion does not hold. The analysis of set-associative caches yields two major results. First, constant factors are obtained which relate to the miss ratios for set-associative caches to miss ratios for other set-associative caches. Then those results are combined with sample cache implementations to show that above certain cache sizes, direct-mapped caches have lower effective access times than set-associative caches, despite having higher miss ratios. Finally, instruction buffers and target instruction buffers are examined as organizations for instruction memory on single-chip microprocessors. The analysis focuses closely on implementation considerations, including the interaction between instruction fetches, instruction prefetches and data references, and uses the SPUR RISC design as the case study. Results show the effects of varying numerous design parameters, suggest some superior designs, and demonstrate that instruction buffers will be preferred to target instruction buffers in future RISC microprocessors implemented on single CMOS chips.

[1]  Alan Jay Smith Cache Evaluation and the Impact of Workload Choice , 1985, ISCA.

[2]  Richard P. Gabriel,et al.  Performance and evaluation of Lisp systems , 1985 .

[3]  Alan Jay Smith,et al.  Experimental evaluation of on-chip microprocessor cache memories , 1984, ISCA '84.

[4]  Richard R. Duncombe The SPUR Instruction Unit: An On-Chip Instruction Cache Memory , 1986 .

[5]  Chris Rowen,et al.  A CMOS RISC Processor with Integrated System Functions , 1986, COMPCON.

[6]  B. Ramakrishna Rau,et al.  The effect of instruction fetch strategies upon the performance of pipelined instruction units , 1977, ISCA '77.

[7]  Janak H. Patel,et al.  A performance model for instruction prefetch in pipelined instruction units , 1982, ICPP.

[8]  Douglas MacGregor,et al.  A Performance Analysis of MC68020-based Systems , 1985, IEEE Micro.

[9]  Randy H. Katz,et al.  An in-cache address translation mechanism , 1986, ISCA '86.

[10]  William D. Strecker Cache memories for PDP-11 family computers , 1976, ISCA.

[11]  J. ContiC.,et al.  Structural aspects of the system/360 model 85 , 1968 .

[12]  Alan Jay Smith,et al.  The memory architecture and the cache and memory management unit for the fairchild clipper processor , 1986 .

[13]  Anant Agarwal,et al.  On-Chip Instruction Caches for High Performance Processors, , 1987 .

[14]  Frank Olken,et al.  Efficient methods for calculating the success function of fixed space replacement policies , 1983, Perform. Evaluation.

[15]  JOHN L. HENNESSY,et al.  VLSI Processor Architecture , 1984, IEEE Transactions on Computers.

[16]  Hubert Rae McLellan Instruction prefetch strategies in a pipelined processor , 1983 .

[17]  R. Mattson Evaluation of multilevel memories , 1971 .

[18]  Kenneth A. Pier A retrospective on the Dorado, a high-performance personal computer , 1983, ISCA '83.

[19]  David R. Ditzel,et al.  The hardware architecture of the CRISP microprocessor , 1987, ISCA '87.

[20]  Kimming So,et al.  Cache design of a sub-micron CMOS system/370 , 1987, ISCA '87.

[21]  David A. Patterson,et al.  Architecture of a VLSI instruction cache for a RISC , 1983, ISCA '83.

[22]  James R. Goodman,et al.  A study of instruction cache organizations and replacement policies , 1983, ISCA '83.

[23]  Alan Jay Smith,et al.  Efficient Analysis of Caching Systems , 1987 .

[24]  N. S. Barnett,et al.  Private communication , 1969 .

[25]  Irving L. Traiger,et al.  Evaluation Techniques for Storage Hierarchies , 1970, IBM Syst. J..

[26]  Alan Jay Smith,et al.  Branch Prediction Strategies and Branch Target Buffer Design , 1995, Computer.

[27]  James R. Larus,et al.  Design Decisions in SPUR , 1986, Computer.

[28]  Christopher J. Terman Simulation tools for digital LSI design , 1983 .

[29]  R. L. Sites,et al.  ATUM: a new technique for capturing address traces using microcode , 1986, ISCA '86.

[30]  Alan Jay Smith,et al.  Sequential Program Prefetching in Memory Hierarchies , 1978, Computer.

[31]  Laszlo A. Belady,et al.  A Study of Replacement Algorithms for Virtual-Storage Computer , 1966, IBM Syst. J..

[32]  Alan Jay Smith,et al.  A Comparative Study of Set Associative Memory Mapping Algorithms and Their Use for Cache and Main Memory , 1978, IEEE Transactions on Software Engineering.

[33]  William D. Strecker,et al.  Transient behavior of cache memories , 1983, TOCS.

[34]  Cedell Alexander,et al.  Cache memory performance in a unix enviroment , 1986, CARN.

[35]  Roland N. Ibbett,et al.  An Analysis of Instruction-Fetching Strategies in Pipelined Computers , 1980, IEEE Transactions on Computers.

[36]  Mark Horowitz,et al.  Architectural tradeoffs in the design of MIPS-X , 1987, ISCA '87.

[37]  Gordon Bell,et al.  An Investigation of Alternative Cache Organizations , 1974, IEEE Transactions on Computers.