A modified approach to data cache management

As processor performance continues to improve, more emphasis must be placed on the performance of the memory system. In this paper, a detailed characterization of data cache behavior for individual load instructions is given. We show that by selectively applying cache line allocation according the characteristics of individual load instructions, overall performance can be improved for both the data cache and the memory system. This approach can improve some aspects of memory performance by as much as 60 percent on existing executables.

[1]  François Irigoin,et al.  Supernode partitioning , 1988, POPL '88.

[2]  Henry G. Dietz,et al.  Unified management of registers and cache using liveness and cache bypass , 1989, PLDI '89.

[3]  Norman P. Jouppi,et al.  Improving direct-mapped cache performance by the addition of a small fully-associative cache and pre , 1990, ISCA 1990.

[4]  Gregory F. Grohoski,et al.  Machine Organization of the IBM RISC System/6000 Processor , 1990, IBM J. Res. Dev..

[5]  Allan Porterfield,et al.  Data cache performance of supercomputer applications , 1990, Proceedings SUPERCOMPUTING '90.

[6]  Henry M. Levy,et al.  An architecture for software-controlled data prefetching , 1991, ISCA '91.

[7]  Y. Patt,et al.  Two-level adaptive training branch prediction , 1991, MICRO 24.

[8]  Vivek Sarkar,et al.  Optimization of array accesses by collective loop transformations , 1991, ICS '91.

[9]  Scott A. Mahlke,et al.  Data access microarchitectures for superscalar processors with compiler-assisted data prefetching , 1991, MICRO 24.

[10]  Michael J. Flynn,et al.  An area model for on-chip memories and its application , 1991 .

[11]  Ken Kennedy,et al.  Software prefetching , 1991, ASPLOS IV.

[12]  Monica S. Lam,et al.  A data locality optimizing algorithm , 1991, PLDI '91.

[13]  T. Wada,et al.  An analytical access time model for on-chip cache memories , 1992 .

[14]  Joseph T. Rahmeh,et al.  Improving the accuracy of dynamic branch prediction using branch correlation , 1992, ASPLOS V.

[15]  Anoop Gupta,et al.  Design and evaluation of a compiler algorithm for prefetching , 1992, ASPLOS V.

[16]  Jean-Loup Baer,et al.  Reducing memory latency via non-blocking and prefetching caches , 1992, ASPLOS V.

[17]  Todd M. Austin,et al.  Dynamic dependency analysis of ordinary programs , 1992, ISCA '92.

[18]  Joseph A. Fisher,et al.  Predicting conditional branch directions from previous runs of a program , 1992, ASPLOS V.

[19]  Yale N. Patt,et al.  Alternative implementations of two-level adaptive branch prediction , 1992, ISCA '92.

[20]  Rajiv Gupta,et al.  Predictability of load/store instruction latencies , 1993, Proceedings of the 26th Annual International Symposium on Microarchitecture.

[21]  Walid A. Najjar,et al.  An evaluation of bottom-up and top-down thread generation techniques , 1993, MICRO 1993.

[22]  Yale N. Patt,et al.  A comparison of dynamic branch predictors that use two levels of branch history , 1993, ISCA '93.

[23]  Alan Eustace,et al.  ATOM - A System for Building Customized Program Analysis Tools , 1994, PLDI.

[24]  James K. Archibald,et al.  On the Accuracy of Memory Reference Models , 1994, Computer Performance Evaluation.

[25]  Ken Kennedy,et al.  Scalar replacement in the presence of conditional control flow , 1994, Softw. Pract. Exp..

[26]  Dirk Grunwald,et al.  Fast and accurate instruction fetch and branch prediction , 1994, ISCA '94.

[27]  Jean-Loup Baer,et al.  Effective Hardware Based Data Prefetching for High-Performance Processors , 1995, IEEE Trans. Computers.