Caches as filters: a framework for the analysis of caching systems

This paper introduces a new analytical framework for analyzing and designing caches. It consists of four major parts: TSpec notation, into which reference traces can be transformed; equivalence classes , which abstract away chance effects of address bindings and specific inputs; the functional filter model , which operates on TSpec traces and provides a formal description of cache operation; and new metrics , which evaluate cache performance. This paper gives an overview of TSpec notation and equivalence classes, and then illustrates how the functional filter model can be used to derive better understanding of cache behavior.

[1]  David Kroft,et al.  Lockup-free instruction fetch/prefetch cache organization , 1998, ISCA '81.

[2]  Anoop Gupta,et al.  Design and evaluation of a compiler algorithm for prefetching , 1992, ASPLOS V.

[3]  Laszlo A. Belady,et al.  A Study of Replacement Algorithms for Virtual-Storage Computer , 1966, IBM Syst. J..

[4]  J. ContiC.,et al.  Structural aspects of the system/360 model 85 , 1968 .

[5]  William A. Wulf,et al.  Data Cache Performance When Vector-Like Accesses Bypass the Cache , 1997 .

[6]  Alan Jay Smith,et al.  Cache Memories , 1982, CSUR.

[7]  Manuel E. Benitez,et al.  A portable global optimizer and linker , 1988, PLDI '88.

[8]  M. Smelyanskiy,et al.  Stack value file: custom microarchitecture for the stack , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.

[9]  Trevor N. Mudge,et al.  An Analytical Model for Designing Memory Hierarchies , 1996, IEEE Trans. Computers.

[10]  David J. Goodman,et al.  Personal Communications , 1994, Mobile Communications.

[11]  Todd M. Austin,et al.  The SimpleScalar tool set, version 2.0 , 1997, CARN.

[12]  Sally A. McKee,et al.  Caches As Filters: A Unifying Model for Memory Hierarchy Analysis , 2000 .

[13]  David A. Wood,et al.  Cache profiling and the SPEC benchmarks: a case study , 1994, Computer.

[14]  Santosh G. Abraham,et al.  Efficient simulation of caches under optimal replacement with applications to miss characterization , 1993, SIGMETRICS '93.

[15]  Dileep Bhandarkar,et al.  Performance characterization of the Pentium Pro processor , 1997, Proceedings Third International Symposium on High-Performance Computer Architecture.

[16]  Sally A. McKee,et al.  TSpec: A Notation for Describing Memory Reference Traces , 2000 .

[17]  Wen-mei W. Hwu,et al.  Run-time spatial locality detection and optimization , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[18]  Sharad Malik,et al.  Precise miss analysis for program transformations with caches of arbitrary associativity , 1998, ASPLOS VIII.

[19]  Norman P. Jouppi,et al.  Complexity/performance tradeoffs with non-blocking loads , 1994, ISCA '94.

[20]  Alan P. Batson,et al.  Measurements of major locality phases in symbolic reference strings , 1976, SIGMETRICS '76.

[21]  Eric Rotenberg,et al.  Trace cache: a low latency approach to high bandwidth instruction fetching , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.

[22]  Gary S. Tyson,et al.  A modified approach to data cache management , 1995, MICRO 1995.

[23]  Sally A. McKee,et al.  Caches as filters: a new approach to cache analysis , 1998, Proceedings. Sixth International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (Cat. No.98TB100247).

[24]  Alan P. Batson,et al.  Characteristics of program localities , 1976, CACM.

[25]  David R. Ditzel,et al.  The hardware architecture of the CRISP microprocessor , 1987, ISCA '87.

[26]  Ronald L. Rivest,et al.  Introduction to Algorithms , 1990 .

[27]  N. Jouppi,et al.  The Relative Importance of Memory Latency , Bandwidth , and Branch Limits toPerformanceNorman , 1997 .

[28]  Knut Stener Grimsrud Quantifying locality , 1993 .

[29]  David R. Ditzel,et al.  Design tradeoffs to support the C programming language in the CRISP microprocessor , 1987, ASPLOS.

[30]  John Paul Shen,et al.  The intrinsic bandwidth requirements of ordinary programs , 1996, ASPLOS VII.

[31]  Maurice V. Wilkes,et al.  Slave Memories and Dynamic Storage Allocation , 1965, IEEE Trans. Electron. Comput..

[32]  James K. Archibald,et al.  BACH: a hardware monitor for tracing microprocessor-based systems , 1993, Microprocessors and microsystems.

[33]  Lee W. Hoevel,et al.  The Software-Cache Connection , 1981, IBM J. Res. Dev..

[34]  Gary S. Tyson,et al.  Utilizing reuse information in data cache management , 1998, ICS '98.

[35]  James Archibald,et al.  BACH: BYU Address Collection Hardware, The Collection of Complete Traces , 1992 .

[36]  K. Grimsrud,et al.  Locality as a Visualization Tool , 1996, IEEE Trans. Computers.

[37]  Richard E. Kessler,et al.  Evaluating stream buffers as a secondary cache replacement , 1994, Proceedings of 21 International Symposium on Computer Architecture.

[38]  Norman P. Jouppi,et al.  Improving direct-mapped cache performance by the addition of a small fully-associative cache and pre , 1990, ISCA 1990.

[39]  Chandra Krintz,et al.  Cache-conscious data placement , 1998, ASPLOS VIII.

[40]  Mark Horowitz,et al.  Performance tradeoffs in cache design , 1988, ISCA '88.

[41]  Rajiv Gupta,et al.  Predictability of load/store instruction latencies , 1993, Proceedings of the 26th Annual International Symposium on Microarchitecture.

[42]  James R. Goodman,et al.  Quantifying Memory Bandwidth Limitations of Current and Future Microprocessors , 1996 .

[43]  Richard E. Ladner,et al.  Cache performance analysis of traversals and random accesses , 1999, SODA '99.

[44]  William H. Mangione-Smith,et al.  Filtering Memory References to Increase Energy Efficiency , 2000, IEEE Trans. Computers.

[45]  John P. Kearns,et al.  Structure within Locality Intervals , 1977, Performance.

[46]  Graham R. Nudd,et al.  Analytical Modeling of Set-Associative Cache Behavior , 1999, IEEE Trans. Computers.

[47]  Olivier Temam,et al.  A quantitative analysis of loop nest locality , 1996, ASPLOS VII.

[48]  F. H. Mcmahon,et al.  The Livermore Fortran Kernels: A Computer Test of the Numerical Performance Range , 1986 .

[49]  Dominique Thiébaut,et al.  On the Fractal Dimension of Computer Programs and its Application to the Prediction of the Cache Miss Ratio , 1989, IEEE Trans. Computers.

[50]  Harold S. Stone,et al.  Footprints in the cache , 1987, TOCS.

[51]  Alan Jay Smith,et al.  Aspects of cache memory and instruction buffer performance , 1987 .

[52]  Sally A. McKee,et al.  Hitting the memory wall: implications of the obvious , 1995, CARN.

[53]  Norman P. Jouppi Cache write policies and performance , 1993, ISCA '93.

[54]  Anant Agarwal,et al.  Column-associative caches: a technique for reducing the miss rate of direct-mapped caches , 1993, ISCA '93.

[55]  李幼升,et al.  Ph , 1989 .

[56]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[57]  Peter J. Denning,et al.  The working set model for program behavior , 1968, CACM.

[58]  James R. Goodman,et al.  A study of instruction cache organizations and replacement policies , 1983, ISCA '83.

[59]  Alvin R. Lebeck,et al.  Load latency tolerance in dynamically scheduled processors , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.

[60]  Gary S. Tyson,et al.  Active Management of Data Caches by Exploiting Reuse Information , 1999, IEEE Trans. Computers.