Investigating the Use of Cache as a Local Memory

When caches were rst designed, they were very small by today's standards, and the components from which they were constructed were very expensive. For many of today's computer architectures, this is no longer true. Caches have become quite large, some as large as a megabyte, and, there is good reason to question whether the structure of a conventional cache was determined by the size and cost factors that prevailed when they were rst developed. If this is the case, then diierent sorts of structures may be more appropriate for today's caches. Our objective is to obtain a deeper understanding of useful program behavior that can be employed at optimizing program and data layout and to motivate architectural features aimed at selective exploitation of locality. In this paper, we present observations from several prooling experiments to advocate the possibility of fully or partially replacing the cache with a local memory, the local memory being used to house the heavily used references. We also present a memory structure called the SemiCache Architecture, which replaces the cache with a combination of cache and local memory, and present simulation results to prove the feasibility of the scheme.

[1]  David A. Wood,et al.  A Comparison of Trace-Sampling Techniques for Multi-Megabyte Caches , 1994, IEEE Trans. Computers.

[2]  Peter J. Denning,et al.  Working Sets Past and Present , 1980, IEEE Transactions on Software Engineering.

[3]  John Paul Shen,et al.  Instruction level profiling and evaluation of the IBM RS/6000 , 1991, [1991] Proceedings. The 18th Annual International Symposium on Computer Architecture.

[4]  Peter J. Denning,et al.  The working set model for program behavior , 1968, CACM.

[5]  B. Ramakrishna Rau,et al.  Pseudo-randomly interleaved memory , 1991, ISCA '91.

[6]  Josep Torrellas,et al.  Optimizing instruction cache performance for operating system intensive workloads , 1995, Proceedings of 1995 1st IEEE Symposium on High Performance Computer Architecture.

[7]  Anant Agarwal,et al.  Column-associative caches: a technique for reducing the miss rate of direct-mapped caches , 1993, ISCA '93.

[8]  Monica S. Lam,et al.  The cache performance and optimizations of blocked algorithms , 1991, ASPLOS IV.

[9]  Frederica Darema,et al.  Memory access patterns of parallel scientific programs , 1987, SIGMETRICS '87.

[10]  Margaret Martonosi,et al.  MemSpy: analyzing memory system bottlenecks in programs , 1992, SIGMETRICS '92/PERFORMANCE '92.

[11]  Donald J. Hatfield Experiments on Page Size, Program Access Patterns, and Virtual Memory Performance , 1972, IBM J. Res. Dev..

[12]  Dhamir N. Mannai,et al.  Classiication and Performance Evaluation of Instruction Buuering Techniques , 1996 .

[13]  Henry G. Dietz,et al.  Unified management of registers and cache using liveness and cache bypass , 1989, PLDI '89.

[14]  Ilkka J. Haikala,et al.  Methodology and empirical results of program behaviour measurements , 1980, PERFORMANCE '80.

[15]  Scott McFarling,et al.  Program optimization for instruction caches , 1989, ASPLOS III.

[16]  Michael J. Flynn,et al.  Performance trade-offs for microprocessor cache memories , 1988, IEEE Micro.

[17]  Jeffrey C. Mogul,et al.  The effect of context switches on cache performance , 1991, ASPLOS IV.

[18]  Ken Kennedy,et al.  Software methods for improvement of cache performance on supercomputer applications , 1989 .

[19]  S. Schwartz,et al.  Properties of the working-set model , 1972, OPSR.

[20]  Richard E. Kessler,et al.  Evaluating stream buffers as a secondary cache replacement , 1994, Proceedings of 21 International Symposium on Computer Architecture.

[21]  Norman P. Jouppi,et al.  Improving direct-mapped cache performance by the addition of a small fully-associative cache and pre , 1990, ISCA 1990.

[22]  David R. Ditzel,et al.  An analysis of MIPS and SPARC instruction set utilization on the SPEC benchmarks , 1991, ASPLOS IV.

[23]  Zarka Cvetanovic,et al.  Characterization of Alpha AXP performance using TP and SPEC workloads , 1994, Proceedings of 21 International Symposium on Computer Architecture.

[24]  Brian N. Bershad,et al.  Avoiding conflict misses dynamically in large direct-mapped caches , 1994, ASPLOS VI.

[25]  J DenningPeter The working set model for program behavior , 1968 .

[26]  Robert B. Hagmann,et al.  Program page reference patterns , 1982, SIGMETRICS '82.

[27]  Ken Kennedy,et al.  Analyzing and visualizing performance of memory hierarchies , 1990 .

[28]  John L. Hennessy,et al.  Performance debugging shared memory multiprocessor programs with MTOOL , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).

[29]  David A. Wood,et al.  Cache profiling and the SPEC benchmarks: a case study , 1994, Computer.