Cache Memories

design issues. Specific aspects of cache memories tha t are investigated include: the cache fetch algorithm (demand versus prefetch), the placement and replacement algorithms, line size, store-through versus copy-back updating of main memory, cold-start versus warm-start miss ratios, mulhcache consistency, the effect of input /output through the cache, the behavior of split data/instruction caches, and cache size. Our discussion includes other aspects of memory system architecture, including translation lookaside buffers. Throughout the paper, we use as examples the implementation of the cache in the Amdahl 470V/6 and 470V/7, the IBM 3081, 3033, and 370/168, and the DEC VAX 11/780. An extensive bibliography is provided.

[1]  Bryan D. Ackland A bit-slice cache controller , 1979, ISCA '79.

[2]  Michael L. Powell,et al.  The DEMOS file system , 1977, SOSP '77.

[3]  Michel Dubois,et al.  Effects of Cache Coherency in Multiprocessors , 1982, IEEE Trans. Computers.

[4]  Guy Mazaré A few examples of how to use a symmetrical multi-micro-processor , 1977, ISCA '77.

[5]  Michael D. Schroeder,et al.  Performance of the GE-645 associative memory while Multics is in operation , 1971, SIGOPS Workshop on System Performance Evaluation.

[6]  King-Sun Fu,et al.  Analysis of multiprocessor cache organizations with alternative main memory update policies , 1981, ISCA '81.

[7]  Alan Jay Smith,et al.  Characterizing the Storage Process and Its Effect on the Update of Main Memory by Write Through , 1979, JACM.

[8]  Irving L. Traiger,et al.  Evaluation Techniques for Storage Hierarchies , 1970, IBM Syst. J..

[9]  David R. Ditzel,et al.  Register allocation for free: The C machine stack cache , 1982, ASPLOS I.

[10]  L. C. Widdoes,et al.  Advanced digital processor technology base development for Navy applications: the S-1 project , 1977 .

[11]  S. J. Waters File design fallacies , 1972, Comput. J..

[12]  William D. Strecker Cache memories for PDP-11 family computers , 1976, ISCA.

[13]  Carl J. Conti,et al.  Structural Aspects of the System/360 Model 85 I: General Organization , 1968, IBM Syst. J..

[14]  Arthur V. Pohm,et al.  Cache memory systems for multiprocessor architecture , 1977, AFIPS '77.

[15]  Gerald S. Shedler,et al.  Empirically derived micromodels for sequences of page exceptions , 1973 .

[16]  J DenningPeter The working set model for program behavior , 1968 .

[17]  C. K. Chow,et al.  Determination of Cache's Capacity and its Matching Storage Hierarchy , 1976, IEEE Transactions on Computers.

[18]  Robert O. Winder,et al.  A Data Base For Computer Performance Evaluation , 1973, Computer.

[19]  S. F. Anderson,et al.  The IBM system/360 model 91: floating-point execution unit , 1967 .

[20]  Michel Dubois,et al.  Performance of cache-based multiprocessors , 1981, SIGMETRICS '81.

[21]  Alan Jay Smith,et al.  Bibliography on paging and related topics , 1978, OPSR.

[22]  B. R Rau Sequential prefetch strategies for instructions and data , 1977 .

[23]  Alan Jay Smith,et al.  Sequentiality and prefetching in database systems , 1978, TODS.

[24]  Alan Jay Smith,et al.  Sequential Program Prefetching in Memory Hierarchies , 1978, Computer.

[25]  Laszlo A. Belady,et al.  A Study of Replacement Algorithms for Virtual-Storage Computer , 1966, IBM Syst. J..

[26]  Arthur V. Pohm,et al.  A Cache Technique for Bus Oriented Multiprocessor Systems , 1982, COMPCON.

[27]  Robert O. Winder,et al.  Cache-based Computer Systems , 1973, Computer.

[28]  Wilhelm Anacker,et al.  Performance Evaluation of Computing Systems with Memory Hierarchies , 1967, IEEE Trans. Electron. Comput..

[29]  Roland N. Ibbett,et al.  The MU5 Computer System , 1979 .

[30]  Arthur E. Cooper,et al.  Development of On-Board Space Computer Systems , 1976, IBM J. Res. Dev..

[31]  Roland N. Ibbett,et al.  The MU5 Name Store , 1977, Comput. J..

[32]  Barry R. Borgerson,et al.  The architecture of the SPERRY UNIVAC 1100 series systems , 1979, ISCA '79.

[33]  Peter J. Denning,et al.  The working set model for program behavior , 1968, CACM.

[34]  Daniel Roger Perkins The design and management of predictive caches , 1980 .

[35]  Leonard J. Shustek,et al.  An instruction timing model of CPU performance , 1977, ISCA '77.

[36]  Osamu Watanabe A Fast Algorithm for Finding all Shortest Paths , 1981, Inf. Process. Lett..

[37]  Bruce L. Hitson,et al.  S-1 architecture manual , 1979 .

[38]  B. Greenberg AN EXPERIMENTAL ANALYSIS OF PROGRAM REFERENCE PATTERNS IN THE MULTICS VIRTUAL MEMORY , 1974 .

[39]  Sant R. Arora,et al.  Statistical Quantification of Instruction and Operand Traces , 1972, Statistical Computer Performance Evaluation.

[40]  Paul Feautrier,et al.  A New Solution to Coherence Problems in Multicache Systems , 1978, IEEE Transactions on Computers.

[41]  David Kroft,et al.  Lockup-free instruction fetch/prefetch cache organization , 1998, ISCA '81.

[42]  John Reilly,et al.  Processor Controller for the IBM 3081 , 1982, IBM J. Res. Dev..

[43]  Gordon Bell,et al.  An Investigation of Alternative Cache Organizations , 1974, IEEE Transactions on Computers.

[44]  William James Harding Hardware-controlled memory hierarchies and their performance. , 1975 .

[45]  R. M. Tomasulo,et al.  An efficient algorithm for exploiting multiple arithmetic units , 1995 .

[46]  George Radin,et al.  The 801 minicomputer , 1982, ASPLOS I.

[47]  A.V. Pohm,et al.  The cost and performance tradeoffs of buffered memories , 1975, Proceedings of the IEEE.

[48]  Roland N. Ibbett The MU5 instruction pipeline , 1972, Comput. J..

[49]  Axel Lehmann,et al.  The Performance of Small Cache Memories in Minicomputer Systems with Several Processors , 1978 .

[50]  Maurice V. Wilkes,et al.  Slave Memories and Segmentation , 1971, IEEE Transactions on Computers.

[51]  Gururaj S. Rao,et al.  Performance Analysis of Cache Memories , 1978, JACM.

[52]  R. N. Gustafson,et al.  IBM 3081 Processor Unit: Design Considerations and Design Process , 1982, IBM J. Res. Dev..

[53]  Malcolm C. Easton,et al.  Computation of Cold-Start Miss Ratios , 1978, IEEE Transactions on Computers.

[54]  Jacques Leroudier,et al.  Performance Evaluation of a Cache Memory for a Mini-computer , 1979, Performance.

[55]  Terry A. Welch,et al.  High-speed buffering for variable length operands , 1977, ISCA '77.

[56]  D. P. Kennedy,et al.  Steady state mathematical theory for the insulated gate field effect transistor , 1973 .

[57]  Axel Lehmann Performance evaluation and prediction of storage hierarchies , 1980, PERFORMANCE '80.

[58]  Mahadev Satyanarayanan,et al.  Design Trade-Offs in VAX-11 Translation Buffer Organization , 1981, Computer.

[59]  Butler W. Lampson,et al.  The Memory System of a High-Performance Personal Computer , 1981, IEEE Transactions on Computers.

[60]  Robert M. Meade On memory system design , 1970, AFIPS '70 (Fall).

[61]  L. J. Boland,et al.  The IBM system/360 model 91: storage system , 1967 .

[62]  Francis F. Lee,et al.  Study of "Look-Aside" Memory , 1969, IEEE Transactions on Computers.

[63]  Lee W. Hoevel,et al.  The Software-Cache Connection , 1981, IBM J. Res. Dev..

[64]  John S. Liptay,et al.  Structural Aspects of the System/360 Model 85 II: The Cache , 1968, IBM Syst. J..

[65]  Kunio Fukunaga,et al.  The Efficient Use of Buffer Storage , 1977, ACM '77.

[66]  Wesley W. Chu,et al.  Program behavior and the page-fault-frequency replacement algorithm , 1976, Computer.

[67]  Maurice V. Wilkes,et al.  Slave Memories and Dynamic Storage Allocation , 1965, IEEE Trans. Electron. Comput..

[68]  Daniel P. Siewiorek,et al.  Impact of Implementation Design Tradeoffs on Performance: The PDP-11, A Case Study , 1978 .

[69]  D. A. Pucknell,et al.  Studies of cache store behaviour in a real-time minicomputer environment , 1975 .

[70]  Jerome H. Saltzer,et al.  A simple linear model of demand paging performance , 1974, Commun. ACM.

[71]  Kotagiri Ramamohanarao,et al.  Hardware Address Translation for Machines with a Large Virtual Memory , 1981, Inf. Process. Lett..

[72]  Jan Gecsei Determining Hit Ratios for Multilevel Hierarchies , 1974, IBM J. Res. Dev..

[73]  Alan Jay Smith,et al.  A Comparative Study of Set Associative Memory Mapping Algorithms and Their Use for Cache and Main Memory , 1978, IEEE Transactions on Software Engineering.

[74]  David W. Anderson,et al.  The IBM System/360 model 91: machine philosophy and instruction-handling , 1967 .

[75]  A. Pohm,et al.  An efficient flexible buffered memory system , 1973 .

[76]  K. Hakozaki,et al.  Pseudo Random Access Memory System With CCD-SR And MOS RAM On A Chip , 1977 .

[77]  N. Ghani,et al.  A Recovery Cache for the PDP-11 , 1980, IEEE Transactions on Computers.

[78]  Douglas W. Clark,et al.  Cache Performance in the VAX-11/780 , 1983, TOCS.

[79]  R. Mattson Evaluation of multilevel memories , 1971 .

[80]  W. H. Henkels,et al.  Basic Design of a Josephson Technology Cache Memory , 1980, IBM J. Res. Dev..

[81]  Calvin K. Tang Cache system design in the tightly coupled multiprocessor system , 1976, AFIPS '76.

[82]  Gerald S. Shedler,et al.  Derivation of miss ratios for merged access streams , 1976 .

[83]  A. J. Smith,et al.  Analysis of branch prediction strategies and branch target buffer design , 1983, Perform. Evaluation.

[84]  N. S. Barnett,et al.  Private communication , 1969 .